如何使用准引用以编程方式重新编码变量?
How to recode a variable programmatically using quasiquotation?
我有以下数据集,想重新编码变量
library(tidyverse)
library(rlang)
mytib <- tribble(~colA, ~colB, ~colC,
"good", "bad", "better",
"better", "bad", "worse",
"good", "best", "good")
在我的数据集中,我有更多的列,所以我正在寻找一种编程方法来重新编码数据集,以便将“坏”和“更差”分解为“糟糕”和“好”,“更好”, “最好”正在崩溃为“棒极了”。所有这些都应该编码到新列中,每个变量一个,如“colA_bin”(对于二进制)、“colB_bin”和“colC_bin”。因为我有很多专栏,所以我想使用 dplyr::select(starts_with(...) & ends_with(...))
函数来做到这一点。
我想出的是:
attractiveness_vars <- mytib %>%
dplyr::select(starts_with(c("col")) & ends_with(c("A", "B", "C")) %>%
names(.)
attractiveness_lvls_neg <- c("bad", "worse")
attractiveness_lvls_pos <- c("good", "better", "best")
attractiveness_lvls_new <- c("terrible", "awesome")
recode_attractiveness <- function(dataframe, column_name, lvls_neg, lvls_pos, lvls_new){
new_col <- dataframe %>%
mutate({{column_name}} := factor(case_when({{column_name}} %in%
lvls_neg ~ lvls_new[1],
{{column_name}} %in%
lvls_pos ~ lvls_new[2],
TRUE ~ NA_character_),
levels = lvls_new)) %>%
pull({{column_name}})
return(new_col)
}
当我运行
recode_attractiveness(mytib, attractiveness_vars, attractiveness_lvls_neg, attractiveness_lvls_pos, attractiveness_lvls_new)
我收到一个错误 ℹ Input `attractiveness_vars` must be size [NROW] or 1, not [length(attractiveness_vars)].
注意,它实际上告诉我数字,我只是试图让它更容易阅读。
可能有更简单的方法来解决这个问题。我很想知道是否有解决此问题的准引用方法或(无论是否存在)优雅的编程解决方案,即不需要我输入 case_when( ...) 代码。
预期的输出应如下所示
colA colA_bin colB colB_bin colC colC_bin
"good" "awesome" "bad" "terrible" "better" "awesome"
...
也许一起跳过函数定义并使用 across
?
library(dplyr) # Version >= 1.0.0
mytib %>%
mutate(across(one_of(attractiveness_vars),
~ factor(case_when(. %in% attractiveness_lvls_neg ~ attractiveness_lvls_new[1],
. %in% attractiveness_lvls_pos ~ attractiveness_lvls_new[2],
TRUE ~ NA_character_),
levels = attractiveness_lvls_new),
.names = "{col}_bin"))
# A tibble: 3 x 6
colA colB colC colA_bin colB_bin colC_bin
<chr> <chr> <chr> <fct> <fct> <fct>
1 good bad better awesome terrible awesome
2 better bad worse awesome terrible terrible
3 good best good awesome awesome awesome
对于奖励积分,您可以使用 forcats::fct_collapse
:
library(forcats)
attractiveness_factors <- setNames(list(attractiveness_lvls_neg, attractiveness_lvls_pos),
attractiveness_lvls_new)
attractiveness_factors
#$terrible
#[1] "bad" "worse"
#$awesome
#[1] "good" "better" "best"
mytib %>%
mutate(across(one_of(attractiveness_vars),
~ fct_collapse(.,!!!attractiveness_factors),
.names = "{col}_bin"))
我有以下数据集,想重新编码变量
library(tidyverse)
library(rlang)
mytib <- tribble(~colA, ~colB, ~colC,
"good", "bad", "better",
"better", "bad", "worse",
"good", "best", "good")
在我的数据集中,我有更多的列,所以我正在寻找一种编程方法来重新编码数据集,以便将“坏”和“更差”分解为“糟糕”和“好”,“更好”, “最好”正在崩溃为“棒极了”。所有这些都应该编码到新列中,每个变量一个,如“colA_bin”(对于二进制)、“colB_bin”和“colC_bin”。因为我有很多专栏,所以我想使用 dplyr::select(starts_with(...) & ends_with(...))
函数来做到这一点。
我想出的是:
attractiveness_vars <- mytib %>%
dplyr::select(starts_with(c("col")) & ends_with(c("A", "B", "C")) %>%
names(.)
attractiveness_lvls_neg <- c("bad", "worse")
attractiveness_lvls_pos <- c("good", "better", "best")
attractiveness_lvls_new <- c("terrible", "awesome")
recode_attractiveness <- function(dataframe, column_name, lvls_neg, lvls_pos, lvls_new){
new_col <- dataframe %>%
mutate({{column_name}} := factor(case_when({{column_name}} %in%
lvls_neg ~ lvls_new[1],
{{column_name}} %in%
lvls_pos ~ lvls_new[2],
TRUE ~ NA_character_),
levels = lvls_new)) %>%
pull({{column_name}})
return(new_col)
}
当我运行
recode_attractiveness(mytib, attractiveness_vars, attractiveness_lvls_neg, attractiveness_lvls_pos, attractiveness_lvls_new)
我收到一个错误 ℹ Input `attractiveness_vars` must be size [NROW] or 1, not [length(attractiveness_vars)].
注意,它实际上告诉我数字,我只是试图让它更容易阅读。
可能有更简单的方法来解决这个问题。我很想知道是否有解决此问题的准引用方法或(无论是否存在)优雅的编程解决方案,即不需要我输入 case_when( ...) 代码。
预期的输出应如下所示
colA colA_bin colB colB_bin colC colC_bin
"good" "awesome" "bad" "terrible" "better" "awesome"
...
也许一起跳过函数定义并使用 across
?
library(dplyr) # Version >= 1.0.0
mytib %>%
mutate(across(one_of(attractiveness_vars),
~ factor(case_when(. %in% attractiveness_lvls_neg ~ attractiveness_lvls_new[1],
. %in% attractiveness_lvls_pos ~ attractiveness_lvls_new[2],
TRUE ~ NA_character_),
levels = attractiveness_lvls_new),
.names = "{col}_bin"))
# A tibble: 3 x 6
colA colB colC colA_bin colB_bin colC_bin
<chr> <chr> <chr> <fct> <fct> <fct>
1 good bad better awesome terrible awesome
2 better bad worse awesome terrible terrible
3 good best good awesome awesome awesome
对于奖励积分,您可以使用 forcats::fct_collapse
:
library(forcats)
attractiveness_factors <- setNames(list(attractiveness_lvls_neg, attractiveness_lvls_pos),
attractiveness_lvls_new)
attractiveness_factors
#$terrible
#[1] "bad" "worse"
#$awesome
#[1] "good" "better" "best"
mytib %>%
mutate(across(one_of(attractiveness_vars),
~ fct_collapse(.,!!!attractiveness_factors),
.names = "{col}_bin"))