引用 R 函数中引用的列名
refer to quoted column name in a function in R
我想在用户定义的函数中使用 collapse 包中的 na_omit
函数。 na_omit
要求将列名放在引号中作为其参数之一。如果我不需要引号中的列名,我可以只引用双括号中的列名,{{col}}
、as mentioned in this vignette, "Programming with dplyr"。如果我使用 glue 包引用该列,例如 glue::glue("{col}")
,我会收到错误消息。
这是一个代表:
my_df <-
data.frame(
matrix(
c(
"V9G","Blue",
NA,"Red",
"J4C","White",
NA,"Brown",
"F7B","Orange",
"G3V","Green"
),
nrow = 6,
ncol = 2,
byrow = TRUE,
dimnames = list(NULL,
c("color_code", "color"))
),
stringsAsFactors = FALSE
)
library(collapse)
library(dplyr)
library(glue)
my_func <- function(df, col){
df %>%
collapse::na_omit(cols = c(glue("{col}"))) #Here is the code that fails
}
my_func(my_df, color_code)
预期的输出可以通过以下方式生成:
my_df %>%
collapse::na_omit(cols = c("color_code"))
并且应该产生:
# color_code color
#1 V9G Blue
#2 J4C White
#3 F7B Orange
#4 G3V Green
我应该如何在 R 中的用户定义函数中引用作为参数和函数参数的带引号的列名?
您必须以字符形式提供列名,例如:
my_func <- function(df, col){
df %>%
collapse::na_omit(cols = c(glue("{col}"))) #Here is the code that fails
}
my_func(my_df, col = "color_code")
一般来说,collapse 主要是标准评估,它的 NSE 特性是基于 base R 的,所以大部分 rlang,glue 东西,{{ }}
等都不起作用,但你会更简单和更快的代码。对于基础 R NSE 函数式编程,请参阅 http://adv-r.had.co.nz/Computing-on-the-language.html.
,对于单列,解决方案是:
my_func <- function(df, col) {
col_char_ref <- as.character(substitute(col))
df %>%
collapse::na_omit(cols = col_char_ref)
}
即使用 substitute()
捕获表达式,使用 as.character
或 all.vars
提取变量。对于多列,一般解决方案是包装 fselect
,例如
library(collapse)
my_func <- function(df, ...) {
cols <- fselect(df, ..., return = "indices")
na_omit(df, cols = cols)
}
my_func(wlddev, PCGDP:GINI, POP) |> head()
#> country iso3c date year decade region
#> 1 Albania ALB 1997-01-01 1996 1990 Europe & Central Asia
#> 2 Albania ALB 2003-01-01 2002 2000 Europe & Central Asia
#> 3 Albania ALB 2006-01-01 2005 2000 Europe & Central Asia
#> 4 Albania ALB 2009-01-01 2008 2000 Europe & Central Asia
#> 5 Albania ALB 2013-01-01 2012 2010 Europe & Central Asia
#> 6 Albania ALB 2015-01-01 2014 2010 Europe & Central Asia
#> income OECD PCGDP LIFEEX GINI ODA POP
#> 1 Upper middle income FALSE 1869.866 72.495 27.0 294089996 3168033
#> 2 Upper middle income FALSE 2572.721 74.579 31.7 453309998 3051010
#> 3 Upper middle income FALSE 3062.674 75.228 30.6 354950012 3011487
#> 4 Upper middle income FALSE 3775.581 75.912 30.0 338510010 2947314
#> 5 Upper middle income FALSE 4276.608 77.252 29.0 335769989 2900401
#> 6 Upper middle income FALSE 4413.297 77.813 34.6 260779999 2889104
由 reprex package (v2.0.1)
创建于 2022-02-03
首先确定您在 R 中编程的环境很重要。您是在 dplyr 还是 base R?如果在 dplyr 中,请参考使用 dplyr, rlang, glue, and . If in base R, reference the documentation on non-standard evaluation 进行编程的文档,尤其是在 as.character(substitute())
中包装带引号的列和在 eval(substitute())
中使用不带引号的列包装函数。
需要注意的是,以上两种方法都涉及non-standard评估。另一种方法是使用标准评估(或标准评估和non-standard评估的某种“组合”)。 For example, see the issue raised in this link.
这个问题的原因至少部分来自环境混乱。以下是 reprex 中的一些不同方法。
数据
my_df <-
data.frame(
matrix(
c(
"V9G","Blue",
NA,"Red",
"J4C","White",
NA,"Brown",
"F7B","Orange",
"G3V","Green"
),
nrow = 6,
ncol = 2,
byrow = TRUE,
dimnames = list(NULL,
c("color_code", "color"))
),
stringsAsFactors = FALSE
)
包
library(collapse)
library(dplyr)
library(stringr)
library(glue)
基础 R 中的函数式编程(non-standard 评估)
带引号的列名:
my_func <- function(df, col) {
col_char_ref <- as.character(substitute(col)) #Use as.character(substitute()) to refer to a quoted column name
df %>%
collapse::na_omit(cols = col_char_ref)
}
my_func(my_df, color_code)
#Should generate output below
my_df %>%
collapse::na_omit(cols = "color_code")
并带有 non-quoted 列名称:
my_func <- my_func <- function(df, col){
df <- df # This makes sure "df" is available inside the function environment where we evaluate the ftransform expression
eval(substitute(collapse::ftransform(df, count = stringr::str_length(col)))) # Wrap the function to be evaluated in eval(substitute())
}
my_func(my_df, color)
#Should generate output below
my_df %>%
collapse::ftransform(count = stringr::str_length(color))
dplyr中的函数式编程(non-standard评价)
带引号的列名 using glue and dplyr 函数:
my_func <- function(df, col1, col2) {
df %>%
mutate(description := glue("color code: {pull(., {{col1}})}; color: {pull(., {{col2}})}"))
}
my_func(my_df, color_code, color)
#Should generate output below
my_df %>%
mutate(description = glue("color code: {color_code}; color: {color}"))
或使用 C 语言包装函数引用列名:
my_func <- function(df, col1, col2) {
df %>%
mutate(description := sprintf("color code: %s; color: %s", {{col1}}, {{col2}}))
}
my_func(my_df, color_code, color)
#Should generate output below
my_df %>%
mutate(description = glue("color code: {color_code}; color: {color}"))
并带有 non-quoted 列名称:
my_func <- function(df, col){
df %>%
dplyr::mutate(count = stringr::str_length({{ col }}))
}
my_func(my_df, color)
#Should generate output below
my_df %>%
dplyr::mutate(count = stringr::str_length(color))
更正 error-producing 代码
以下产生错误的代码为以下两个示例提供了动机:
my_func <- function(df, col){
df <- df
df %>%
collapse::na_omit(cols = as.character(substitute(col))) %>%
eval(substitute(collapse::ftransform(description = stringr::str_length(col))))
}
my_func(my_df, color_code)
#Error in ckmatch(cols, nam) : Unknown columns: col
以下示例是不会产生错误的备选方案。
Base R 中的函数式编程(标准评估 - 要求在函数中将列作为字符串传递)
library(pkgcond)
my_func <- function(df, col) {
if (!is.character(substitute(col)))
pkgcond::pkg_error("col must be a quoted string") #if users aren't used to quoted strings as inputs to a function
df <- na_omit(df, cols = col)
df$count <- stringr::str_length(.subset2(df, col))
df
}
my_func(my_df, "color_code")
#Should generate output below
my_df %>%
na_omit(cols = "color_code") %>%
ftransform(description = stringr::str_length("color_code"))
Base R 中的函数式编程(标准评估和 non-standard 评估的“组合”)
my_func <- function(df, col){
df <- df
df <- collapse::na_omit(df, cols = as.character(substitute(col))) # Unlike the code with the error, the function is not piped (using %>%)
eval(substitute(collapse::ftransform(df, description = stringr::str_length(col))))
}
my_func(my_df, color_code)
#Should generate output below
my_df %>%
na_omit(cols = "color_code") %>%
ftransform(description = stringr::str_length("color_code"))
More complex examples using the collapse package can be referenced at this link.
我想在用户定义的函数中使用 collapse 包中的 na_omit
函数。 na_omit
要求将列名放在引号中作为其参数之一。如果我不需要引号中的列名,我可以只引用双括号中的列名,{{col}}
、as mentioned in this vignette, "Programming with dplyr"。如果我使用 glue 包引用该列,例如 glue::glue("{col}")
,我会收到错误消息。
这是一个代表:
my_df <-
data.frame(
matrix(
c(
"V9G","Blue",
NA,"Red",
"J4C","White",
NA,"Brown",
"F7B","Orange",
"G3V","Green"
),
nrow = 6,
ncol = 2,
byrow = TRUE,
dimnames = list(NULL,
c("color_code", "color"))
),
stringsAsFactors = FALSE
)
library(collapse)
library(dplyr)
library(glue)
my_func <- function(df, col){
df %>%
collapse::na_omit(cols = c(glue("{col}"))) #Here is the code that fails
}
my_func(my_df, color_code)
预期的输出可以通过以下方式生成:
my_df %>%
collapse::na_omit(cols = c("color_code"))
并且应该产生:
# color_code color
#1 V9G Blue
#2 J4C White
#3 F7B Orange
#4 G3V Green
我应该如何在 R 中的用户定义函数中引用作为参数和函数参数的带引号的列名?
您必须以字符形式提供列名,例如:
my_func <- function(df, col){
df %>%
collapse::na_omit(cols = c(glue("{col}"))) #Here is the code that fails
}
my_func(my_df, col = "color_code")
一般来说,collapse 主要是标准评估,它的 NSE 特性是基于 base R 的,所以大部分 rlang,glue 东西,{{ }}
等都不起作用,但你会更简单和更快的代码。对于基础 R NSE 函数式编程,请参阅 http://adv-r.had.co.nz/Computing-on-the-language.html.
my_func <- function(df, col) {
col_char_ref <- as.character(substitute(col))
df %>%
collapse::na_omit(cols = col_char_ref)
}
即使用 substitute()
捕获表达式,使用 as.character
或 all.vars
提取变量。对于多列,一般解决方案是包装 fselect
,例如
library(collapse)
my_func <- function(df, ...) {
cols <- fselect(df, ..., return = "indices")
na_omit(df, cols = cols)
}
my_func(wlddev, PCGDP:GINI, POP) |> head()
#> country iso3c date year decade region
#> 1 Albania ALB 1997-01-01 1996 1990 Europe & Central Asia
#> 2 Albania ALB 2003-01-01 2002 2000 Europe & Central Asia
#> 3 Albania ALB 2006-01-01 2005 2000 Europe & Central Asia
#> 4 Albania ALB 2009-01-01 2008 2000 Europe & Central Asia
#> 5 Albania ALB 2013-01-01 2012 2010 Europe & Central Asia
#> 6 Albania ALB 2015-01-01 2014 2010 Europe & Central Asia
#> income OECD PCGDP LIFEEX GINI ODA POP
#> 1 Upper middle income FALSE 1869.866 72.495 27.0 294089996 3168033
#> 2 Upper middle income FALSE 2572.721 74.579 31.7 453309998 3051010
#> 3 Upper middle income FALSE 3062.674 75.228 30.6 354950012 3011487
#> 4 Upper middle income FALSE 3775.581 75.912 30.0 338510010 2947314
#> 5 Upper middle income FALSE 4276.608 77.252 29.0 335769989 2900401
#> 6 Upper middle income FALSE 4413.297 77.813 34.6 260779999 2889104
由 reprex package (v2.0.1)
创建于 2022-02-03首先确定您在 R 中编程的环境很重要。您是在 dplyr 还是 base R?如果在 dplyr 中,请参考使用 dplyr, rlang, glue, and as.character(substitute())
中包装带引号的列和在 eval(substitute())
中使用不带引号的列包装函数。
需要注意的是,以上两种方法都涉及non-standard评估。另一种方法是使用标准评估(或标准评估和non-standard评估的某种“组合”)。 For example, see the issue raised in this link.
这个问题的原因至少部分来自环境混乱。以下是 reprex 中的一些不同方法。
数据
my_df <-
data.frame(
matrix(
c(
"V9G","Blue",
NA,"Red",
"J4C","White",
NA,"Brown",
"F7B","Orange",
"G3V","Green"
),
nrow = 6,
ncol = 2,
byrow = TRUE,
dimnames = list(NULL,
c("color_code", "color"))
),
stringsAsFactors = FALSE
)
包
library(collapse)
library(dplyr)
library(stringr)
library(glue)
基础 R 中的函数式编程(non-standard 评估)
带引号的列名:
my_func <- function(df, col) {
col_char_ref <- as.character(substitute(col)) #Use as.character(substitute()) to refer to a quoted column name
df %>%
collapse::na_omit(cols = col_char_ref)
}
my_func(my_df, color_code)
#Should generate output below
my_df %>%
collapse::na_omit(cols = "color_code")
并带有 non-quoted 列名称:
my_func <- my_func <- function(df, col){
df <- df # This makes sure "df" is available inside the function environment where we evaluate the ftransform expression
eval(substitute(collapse::ftransform(df, count = stringr::str_length(col)))) # Wrap the function to be evaluated in eval(substitute())
}
my_func(my_df, color)
#Should generate output below
my_df %>%
collapse::ftransform(count = stringr::str_length(color))
dplyr中的函数式编程(non-standard评价)
带引号的列名 using glue and dplyr 函数:
my_func <- function(df, col1, col2) {
df %>%
mutate(description := glue("color code: {pull(., {{col1}})}; color: {pull(., {{col2}})}"))
}
my_func(my_df, color_code, color)
#Should generate output below
my_df %>%
mutate(description = glue("color code: {color_code}; color: {color}"))
或使用 C 语言包装函数引用列名:
my_func <- function(df, col1, col2) {
df %>%
mutate(description := sprintf("color code: %s; color: %s", {{col1}}, {{col2}}))
}
my_func(my_df, color_code, color)
#Should generate output below
my_df %>%
mutate(description = glue("color code: {color_code}; color: {color}"))
并带有 non-quoted 列名称:
my_func <- function(df, col){
df %>%
dplyr::mutate(count = stringr::str_length({{ col }}))
}
my_func(my_df, color)
#Should generate output below
my_df %>%
dplyr::mutate(count = stringr::str_length(color))
更正 error-producing 代码
以下产生错误的代码为以下两个示例提供了动机:
my_func <- function(df, col){
df <- df
df %>%
collapse::na_omit(cols = as.character(substitute(col))) %>%
eval(substitute(collapse::ftransform(description = stringr::str_length(col))))
}
my_func(my_df, color_code)
#Error in ckmatch(cols, nam) : Unknown columns: col
以下示例是不会产生错误的备选方案。
Base R 中的函数式编程(标准评估 - 要求在函数中将列作为字符串传递)
library(pkgcond)
my_func <- function(df, col) {
if (!is.character(substitute(col)))
pkgcond::pkg_error("col must be a quoted string") #if users aren't used to quoted strings as inputs to a function
df <- na_omit(df, cols = col)
df$count <- stringr::str_length(.subset2(df, col))
df
}
my_func(my_df, "color_code")
#Should generate output below
my_df %>%
na_omit(cols = "color_code") %>%
ftransform(description = stringr::str_length("color_code"))
Base R 中的函数式编程(标准评估和 non-standard 评估的“组合”)
my_func <- function(df, col){
df <- df
df <- collapse::na_omit(df, cols = as.character(substitute(col))) # Unlike the code with the error, the function is not piped (using %>%)
eval(substitute(collapse::ftransform(df, description = stringr::str_length(col))))
}
my_func(my_df, color_code)
#Should generate output below
my_df %>%
na_omit(cols = "color_code") %>%
ftransform(description = stringr::str_length("color_code"))
More complex examples using the collapse package can be referenced at this link.