引用 R 函数中引用的列名

Question

我想在用户定义的函数中使用 collapse 包中的 na_omit 函数。 na_omit 要求将列名放在引号中作为其参数之一。如果我不需要引号中的列名，我可以只引用双括号中的列名，{{col}}、as mentioned in this vignette, "Programming with dplyr"。如果我使用 glue 包引用该列，例如 glue::glue("{col}")，我会收到错误消息。

这是一个代表：

my_df <-
  data.frame(
    matrix(
      c(
        "V9G","Blue",
        NA,"Red",
        "J4C","White",
        NA,"Brown",
        "F7B","Orange",
        "G3V","Green"
      ),
      nrow = 6,
      ncol = 2,
      byrow = TRUE,
      dimnames = list(NULL,
                      c("color_code", "color"))
    ),
    stringsAsFactors = FALSE
  )

library(collapse)
library(dplyr)
library(glue)

my_func <- function(df, col){
  df %>% 
    collapse::na_omit(cols = c(glue("{col}"))) #Here is the code that fails
}

my_func(my_df, color_code)

预期的输出可以通过以下方式生成：

my_df %>% 
  collapse::na_omit(cols = c("color_code"))

并且应该产生：

#  color_code  color
#1        V9G   Blue
#2        J4C  White
#3        F7B Orange
#4        G3V  Green

我应该如何在 R 中的用户定义函数中引用作为参数和函数参数的带引号的列名？

Answer 1

您必须以字符形式提供列名，例如：

my_func <- function(df, col){
  df %>% 
    collapse::na_omit(cols = c(glue("{col}"))) #Here is the code that fails
}

my_func(my_df, col = "color_code")

Answer 2

一般来说，collapse 主要是标准评估，它的 NSE 特性是基于 base R 的，所以大部分 rlang，glue 东西，{{ }} 等都不起作用，但你会更简单和更快的代码。对于基础 R NSE 函数式编程，请参阅 http://adv-r.had.co.nz/Computing-on-the-language.html.

，对于单列，解决方案是：

my_func <- function(df, col) { 
  col_char_ref <- as.character(substitute(col))
  df %>% 
    collapse::na_omit(cols = col_char_ref)
}

即使用 substitute() 捕获表达式，使用 as.character 或 all.vars 提取变量。对于多列，一般解决方案是包装 fselect，例如

library(collapse)
my_func <- function(df, ...) {
  cols <- fselect(df, ..., return = "indices")
  na_omit(df, cols = cols) 
}

my_func(wlddev, PCGDP:GINI, POP) |> head()
#>   country iso3c       date year decade                region
#> 1 Albania   ALB 1997-01-01 1996   1990 Europe & Central Asia
#> 2 Albania   ALB 2003-01-01 2002   2000 Europe & Central Asia
#> 3 Albania   ALB 2006-01-01 2005   2000 Europe & Central Asia
#> 4 Albania   ALB 2009-01-01 2008   2000 Europe & Central Asia
#> 5 Albania   ALB 2013-01-01 2012   2010 Europe & Central Asia
#> 6 Albania   ALB 2015-01-01 2014   2010 Europe & Central Asia
#>                income  OECD    PCGDP LIFEEX GINI       ODA     POP
#> 1 Upper middle income FALSE 1869.866 72.495 27.0 294089996 3168033
#> 2 Upper middle income FALSE 2572.721 74.579 31.7 453309998 3051010
#> 3 Upper middle income FALSE 3062.674 75.228 30.6 354950012 3011487
#> 4 Upper middle income FALSE 3775.581 75.912 30.0 338510010 2947314
#> 5 Upper middle income FALSE 4276.608 77.252 29.0 335769989 2900401
#> 6 Upper middle income FALSE 4413.297 77.813 34.6 260779999 2889104

^{由 reprex package (v2.0.1)}

创建于 2022-02-03

Answer 3

首先确定您在 R 中编程的环境很重要。您是在 dplyr 还是 base R？如果在 dplyr 中，请参考使用 dplyr, rlang, glue, and . If in base R, reference the documentation on non-standard evaluation 进行编程的文档，尤其是在 as.character(substitute()) 中包装带引号的列和在 eval(substitute()) 中使用不带引号的列包装函数。

需要注意的是，以上两种方法都涉及non-standard评估。另一种方法是使用标准评估（或标准评估和non-standard评估的某种“组合”）。 For example, see the issue raised in this link.

这个问题的原因至少部分来自环境混乱。以下是 reprex 中的一些不同方法。

数据

my_df <-
  data.frame(
    matrix(
      c(
        "V9G","Blue",
        NA,"Red",
        "J4C","White",
        NA,"Brown",
        "F7B","Orange",
        "G3V","Green"
      ),
      nrow = 6,
      ncol = 2,
      byrow = TRUE,
      dimnames = list(NULL,
                      c("color_code", "color"))
    ),
    stringsAsFactors = FALSE
  )

包

library(collapse)
library(dplyr)
library(stringr)
library(glue)

基础 R 中的函数式编程（non-standard 评估）
带引号的列名：

my_func <- function(df, col) {
  col_char_ref <- as.character(substitute(col)) #Use as.character(substitute()) to refer to a quoted column name
  df %>% 
    collapse::na_omit(cols = col_char_ref) 
}

my_func(my_df, color_code)

#Should generate output below
my_df %>% 
  collapse::na_omit(cols = "color_code")

并带有 non-quoted 列名称：

my_func <- my_func <- function(df, col){
  df <- df # This makes sure "df" is available inside the function environment where we evaluate the ftransform expression
  eval(substitute(collapse::ftransform(df, count = stringr::str_length(col)))) # Wrap the function to be evaluated in eval(substitute())
}

 my_func(my_df, color)

 #Should generate output below
 my_df %>%  
  collapse::ftransform(count = stringr::str_length(color))

dplyr中的函数式编程（non-standard评价）
带引号的列名 using glue and dplyr 函数：

my_func <- function(df, col1, col2) {
  df %>%
    mutate(description := glue("color code: {pull(., {{col1}})}; color: {pull(., {{col2}})}"))
}

my_func(my_df, color_code, color)

#Should generate output below
my_df %>%
  mutate(description = glue("color code: {color_code}; color: {color}"))

或使用 C 语言包装函数引用列名：

my_func <- function(df, col1, col2) {
  df %>%
    mutate(description := sprintf("color code: %s; color: %s", {{col1}}, {{col2}}))
}

my_func(my_df, color_code, color)

#Should generate output below
my_df %>%
  mutate(description = glue("color code: {color_code}; color: {color}"))

并带有 non-quoted 列名称：

my_func <- function(df, col){
  df %>%  
    dplyr::mutate(count = stringr::str_length({{ col }}))
}

my_func(my_df, color)

#Should generate output below
my_df %>% 
  dplyr::mutate(count = stringr::str_length(color))

更正 error-producing 代码
以下产生错误的代码为以下两个示例提供了动机：

my_func <- function(df, col){
  df <- df
  df %>%  
    collapse::na_omit(cols = as.character(substitute(col))) %>% 
    eval(substitute(collapse::ftransform(description = stringr::str_length(col))))
}

my_func(my_df, color_code)

#Error in ckmatch(cols, nam) : Unknown columns: col

以下示例是不会产生错误的备选方案。

Base R 中的函数式编程（标准评估 - 要求在函数中将列作为字符串传递）

library(pkgcond)

my_func <- function(df, col) {
  if (!is.character(substitute(col)))
    pkgcond::pkg_error("col must be a quoted string") #if users aren't used to quoted strings as inputs to a function
  df <- na_omit(df, cols = col) 
  df$count <- stringr::str_length(.subset2(df, col))
  df
}

my_func(my_df, "color_code")

#Should generate output below
my_df %>% 
  na_omit(cols = "color_code") %>% 
  ftransform(description = stringr::str_length("color_code"))

Base R 中的函数式编程（标准评估和 non-standard 评估的“组合”）

my_func <- function(df, col){
  df <- df
  df <- collapse::na_omit(df, cols = as.character(substitute(col))) # Unlike the code with the error, the function is not piped (using %>%)
  eval(substitute(collapse::ftransform(df, description = stringr::str_length(col))))
}

 my_func(my_df, color_code)

 #Should generate output below
 my_df %>% 
  na_omit(cols = "color_code") %>% 
  ftransform(description = stringr::str_length("color_code"))

More complex examples using the collapse package can be referenced at this link.

引用 R 函数中引用的列名

refer to quoted column name in a function in R

r

function

dataframe

dplyr

r-glue