如何在用户定义的公式中调用命名变量?使用 dplyr::summarise() 无法识别

How to call a named variable in a user-defined formula? Not recognized using dplyr::summarise()

我正在尝试使用 pROC 包中的 ci.auc() 函数创建一个函数来提取 'area under the curve' 估计的 2 个命名变量的置信区间,但它产生并错误: Error in model.frame.default(formula = anchor, data = namedvar1, : 'data' must be a data.frame, environment, or list 如何解决这个问题?有没有更好的方法来指定从哪个数据帧中提取命名变量?

原始代码工作正常:

library(pROC)

df <- structure(list(anchor1 = c(1, 0, 1, 0, 1, 0), namedvar1 = c(0.603, 
-0.006, 0, 0.263, 0, -0.089), namedvar2 = c(0.150346263678009, 
0.388250731888, -0.2579633906095, 0.2562039253, 0.139948502022, 
-0.267652844)), row.names = c(6L, 7L, 12L, 13L, 19L, 29L), class = "data.frame")


# Base example with to extract CI bounds & estimate
 as.numeric (ci.auc ( roc (df$anchor1, df$namedvar1, smooth = FALSE,         
                         direction = "<" ,ci = TRUE, boot.stratified = TRUE )  ) )

# Output looks good:
[1] 0.2908208 0.7777778 1.0000000

太好了,所以我将上面的内容整合到我的函数中(我想为多个命名变量这样做):

### CREATE FUNC TO CALCULATE AUC and 95% CIs

new_roc <- function( df, anchor, na.rm = T) {
  anchor <- enquo(anchor)

  # Calculate and save this information as an object
  dplyr::summarise(df, 
                   # ci.auc() & roc() are from pROC package
                   "Var1 AUC CIs"  = as.numeric (ci.auc (roc (anchor, namedvar1, smooth = FALSE,         
                                                                 direction = "<" ,ci = TRUE, boot.stratified = TRUE )  ) ),
                   "Var2 AUC CIs"  = as.numeric (ci.auc (roc (anchor, namedvar2, smooth = FALSE, 
                                                                 direction = "<" ,ci = TRUE, boot.stratified = TRUE )  ) )
  ) 
}

但是当我测试的时候,我得到一个错误!

# Try the function

new_roc(df, anchor1 )

# Error output: 
`Error in model.frame.default(formula = anchor, data = namedvar1,  :'data' must be a data.frame, environment, or list`

我试过 class(df) 确实是 data.frame 所以不确定是什么问题。

为了找出问题所在,我尝试了内部代码,但首先指定了数据帧 - 不起作用:

 # Doesn't work to pipe the df
df %>% 
  as.numeric (ci.auc (roc (anchor1, namedvar1, smooth = FALSE,         
                                        direction = "<" ,ci = TRUE, boot.stratified = TRUE )  ) )
  
## Produces error

Error in roc(anchor1, namedvar1, smooth = FALSE, direction = "<",  : 
  object 'anchor1' not found
 

也许我不必要地使用 dplyr?是否有不同的方法来指定从哪个数据帧中提取命名变量?谢谢!

我也尝试过放弃 dplyr 并直接调用数据帧,但也不起作用:

new_roc <- function( df, anchor, na.rm = T) {
  anchor <- enquo(anchor)

  # Calculate and save this information as an object
                   # ci.auc() & roc() are from pROC package
                   "Var1 AUC CIs"  = as.numeric (ci.auc (roc (data[[anchor], data[[namedvar1], smooth = FALSE,         
                                                                 direction = "<" ,ci = TRUE, boot.stratified = TRUE )  ) ),
                   "Var2 AUC CIs"  = as.numeric (ci.auc (roc (data[[anchor], data[[namedvar2], smooth = FALSE, 
                                                                 direction = "<" ,ci = TRUE, boot.stratified = TRUE )  ) )
  ) 
}

# Produces a different Error:
Error in .subset2(x, i, exact = exact) : 
  invalid subscript type 'language' 

我们需要使用 !!enquo 进行评估,或者可以将其修改为 {{}}

new_roc <- function( df, anchor, na.rm = TRUE) {



      dplyr::summarise(df, 
             
             `Var1 AUC CIs`  = as.numeric (ci.auc (roc ({{anchor}},
                  namedvar1, smooth = FALSE,         
                    direction = "<" ,ci = TRUE, boot.stratified = TRUE )  ) ),
               `Var2 AUC CIs`  = as.numeric (ci.auc (roc ({{anchor}},
               namedvar2, smooth = FALSE, 
                  direction = "<" ,ci = TRUE, boot.stratified = TRUE )  ) )
     ) 
  }

-测试

new_roc(df, anchor1)
#Setting levels: control = 0, case = 1
#Setting levels: control = 0, case = 1
#  Var1 AUC CIs Var2 AUC CIs
#1    0.2908208    0.0000000
#2    0.7777778    0.3333333
#3    1.0000000    0.9866547

相同
as.numeric (ci.auc ( roc (df$anchor1, df$namedvar1, smooth = FALSE,         
                         direction = "<" ,ci = TRUE, boot.stratified = TRUE )  ) )
#Setting levels: control = 0, case = 1
#[1] 0.2908208 0.7777778 1.0000000

如果您不想使用 dplyr 方法,请尝试:

library(pROC)

new_roc <- function( df, anchor) {
  
data.frame(Var1_AUC_CIs = as.numeric(ci.auc(roc(df[[anchor]], df$namedvar1, smooth = FALSE, direction = "<",
                          ci = TRUE, boot.stratified = TRUE))), 
           Var2_AUC_CIs = as.numeric(ci.auc(roc(df[[anchor]], df$namedvar2, smooth = FALSE, direction = "<",
                          ci = TRUE, boot.stratified = TRUE)))) 
}
new_roc(df, 'anchor1')

#  Var1_AUC_CIs Var2_AUC_CIs
#1    0.2908208    0.0000000
#2    0.7777778    0.3333333
#3    1.0000000    0.9866547