在 mutate 函数中包含运算符
embracing operator inside mutate function
我正在尝试编写一个函数,我经常在我的论文中使用它,但很难达到 运行。
代码有效,但一旦我 运行 函数失败,我认为,因为 R 如何通过包含函数选项读取指定变量。这是一个变量 prburden 和 link to sample data:
的成功代码
rburden_data2 %>%
# select only percent rent burden vars
select(tractid, year, CBSA_name, contains("prburden")) %>%
# group by tractid and count # of tracts by group
group_by(tractid) %>%
# create rent burden change indicator - continuous
mutate(cont_chg_prburden = prburden[year == "2019"] - prburden[year == "2000"] ) %>%
# create rent burden change indicator - categorical
mutate(cat_chg_prburden = case_when(cont_chg_prburden < 0 ~ "negative",
cont_chg_prburden == 0 ~ "zero",
cont_chg_prburden > 0 ~ "positive" ,
TRUE ~ "NA")) %>%
# create rent burden change indicator - binary
mutate(bi_chg_prburden = case_when(cat_chg_prburden == "negative" ~ "loss",
cat_chg_prburden == "positive" ~ "gain",
TRUE ~ "NA")) %>%
glimpse()
在这个命令中我:
- 取我的数据集 (rburden_data2)
- 仅包含重要变量的子集
- 按人口普查区 (tractid) 分组
- 创建第一年和去年租金负担差异的连续指标
- 创建负变化、零变化和正变化的分类指标
- 创建研究期间收益或损失的二元指标
这是我尝试 运行 的函数,我在其中指定了相同的变量 prburden:
# function ++++++++++++++++++++
change_indicators <- function(data, var){
data %>%
# select only percent rent burden vars
select(tractid, year, contains("prburden")) %>%
# group by tractid and count # of tracts by group
group_by(tractid) %>%
# create rent burden change indicator - continuous
mutate("cont_chg_{{ var }} ":= "{{var}}"[year == 2019] - "{{var}}"[year == 2000]) %>%
# create rent burden change indicator - categorical
mutate("cat_chg_{{ var }}" := case_when("cont_chg_{{ var }}" < 0 ~ "negative",
"cont_chg_{{ var }}" == 0 ~ "zero",
"cont_chg_{{ var }}" > 0 ~ "positive" ,
TRUE ~ "NA")) %>%
# create rent burden change indicator - binary
mutate("bi_chg_{{ var }}" := case_when("cat_chg_{{ var }}" == "negative" ~ "loss",
"cat_chg_{{ var }}" == "positive" ~ "gain",
TRUE ~ "NA")) %>%
glimpse()
}
# test ++++++++++++++++++++
test <- change_indicators(data = rburden_data2,
var = prburden)
# error ++++++++++++++++++++
Error: Problem with `mutate()` column `cont_chg_prburden `.
ℹ `cont_chg_prburden = "{{var}}"[year == 2019] - "{{var}}"[year == 2000]`.
x non-numeric argument to binary operator
ℹ The error occurred in group 1: tractid = "01001020100".
我 运行 遇到的问题是,当我通过所有名称更改传递变量“prburden”,然后调用它来计算年份差异时。我对如何使用 {{}} 运算符感到很困惑,因为我认为我不需要在第一个 := 之后使用“”,但它会引发错误。
如果能帮助我将第一个代码块转换为可执行函数,我们将不胜感激。谢谢!
试试这个功能 -
library(dplyr)
change_indicators <- function(data, var){
val <- deparse(substitute(var))
col1 <- paste0('cont_chg_', val)
col2 <- paste0('cat_chg_', val)
col3 <- paste0('bi_chg_', val)
data %>%
# select only percent rent burden vars
select(tractid, year, contains("prburden")) %>%
# group by tractid and count # of tracts by group
group_by(tractid) %>%
# create rent burden change indicator - continuous
mutate(!! col1 := {{var}}[year == 2019] - {{var}}[year == 2000],
!! col2 := case_when(.data[[col1]] < 0 ~ "negative",
.data[[col1]] == 0 ~ "zero",
.data[[col1]] > 0 ~ "positive" ,
TRUE ~ NA_character_),
!!col3 := case_when(.data[[col2]] == "negative" ~ "loss",
.data[[col2]] == "positive" ~ "gain",
TRUE ~ NA_character_)) %>%
glimpse()
}
- 直接使用列名时使用
{{var}}
。
- 我不确定
"cont_chg_{{ var }}"
是否有效,我更喜欢在那里使用 .data
代词。
- 要指定列名,请使用
!!name :=
创建新列。
- 将“NA”替换为
NA_character_
- 将所有内容合并为一个
mutate
调用。
data %>%
ungroup %>%
change_indicators(prburden)
#Rows: 219,246
#Columns: 10
#Groups: tractid [73,082]
#$ tractid <chr> "01001020100", "01001020100", "01001020100"…
#$ year <chr> "2000", "2013", "2019", "2000", "2013", "20…
#$ prburden_no <dbl> 60.73620, 67.88991, 44.64286, 46.07843, 42.…
#$ prburden <dbl> 14.110429, 13.761468, 35.119048, 16.666667,…
#$ prburden_sev <dbl> 17.177914, 18.348624, 18.452381, 26.470588,…
#$ prburden_not <dbl> 7.975460, 0.000000, 1.785714, 10.784314, 10…
#$ prburden_all <dbl> 31.28834, 32.11009, 53.57143, 43.13725, 46.…
#$ cont_chg_prburden <dbl> 21.008618, 21.008618, 21.008618, 7.457847, …
#$ cat_chg_prburden <chr> "positive", "positive", "positive", "positi…
#$ bi_chg_prburden <chr> "gain", "gain", "gain", "gain", "gain", "ga…
我发现最简单的方法是先定义新变量,将它们转换为符号,然后包含结果变量,如下所示:
change_indicators <- function(data, var){
#create new variable names first, and use the rlang::sym() function to convert from string to symbol
var_new <- sym(var)
cont_var <- sym(paste0("cont_chg_", var))
cat_var <- sym(paste0("cat_chg_", var))
bi_var <- sym(paste0("bi_chg_", var))
data %>%
# select only percent rent burden vars
select(tractid, year, contains("prburden")) %>%
# group by tractid and count # of tracts by group
group_by(tractid) %>%
# create rent burden change indicator - continuous
mutate({{cont_var}}:= {{var_new}}[year == 2019] - {{var_new}}[year == 2000]) %>%
# create rent burden change indicator - categorical
mutate({{cat_var}} := case_when({{cont_var}} < 0 ~ "negative",
{{cont_var}} == 0 ~ "zero",
{{cont_var}} > 0 ~ "positive" ,
TRUE ~ "NA")) %>%
# create rent burden change indicator - binary
mutate({{bi_var}} := case_when({{cat_var}} == "negative" ~ "loss",
{{cat_var}} == "positive" ~ "gain",
TRUE ~ "NA")) %>%
glimpse()
}
这是我对您的数据 运行 执行此操作时的输出,首先取消分组 df 后:
rburden_data2 <- read_rds("data/rburden_data2.rds") %>%
ungroup()
# use a quote here, because in previous comments you loop through a vector
test <- change_indicators(data = rburden_data2,
var = "prburden")
Rows: 219,246
Columns: 10
Groups: tractid [73,082]
$ tractid <chr> "01001020100", "01001020100", "01001020100", "01001020200", "…
$ year <chr> "2000", "2013", "2019", "2000", "2013", "2019", "2000", "2013…
$ prburden_no <dbl> 60.73620, 67.88991, 44.64286, 46.07843, 42.90429, 33.85214, 6…
$ prburden <dbl> 14.110429, 13.761468, 35.119048, 16.666667, 3.300330, 24.1245…
$ prburden_sev <dbl> 17.177914, 18.348624, 18.452381, 26.470588, 43.564356, 38.521…
$ prburden_not <dbl> 7.975460, 0.000000, 1.785714, 10.784314, 10.231023, 3.501946,…
$ prburden_all <dbl> 31.28834, 32.11009, 53.57143, 43.13725, 46.86469, 62.64591, 2…
$ cont_chg_prburden <dbl> 21.0086182, 21.0086182, 21.0086182, 7.4578470, 7.4578470, 7.4…
$ cat_chg_prburden <chr> "positive", "positive", "positive", "positive", "positive", "…
$ bi_chg_prburden <chr> "gain", "gain", "gain", "gain", "gain", "gain", "gain", "gain…
这避免了尝试在其他函数(如 case_when)中转换为符号的问题,这可能会令人头疼。
我正在尝试编写一个函数,我经常在我的论文中使用它,但很难达到 运行。
代码有效,但一旦我 运行 函数失败,我认为,因为 R 如何通过包含函数选项读取指定变量。这是一个变量 prburden 和 link to sample data:
的成功代码 rburden_data2 %>%
# select only percent rent burden vars
select(tractid, year, CBSA_name, contains("prburden")) %>%
# group by tractid and count # of tracts by group
group_by(tractid) %>%
# create rent burden change indicator - continuous
mutate(cont_chg_prburden = prburden[year == "2019"] - prburden[year == "2000"] ) %>%
# create rent burden change indicator - categorical
mutate(cat_chg_prburden = case_when(cont_chg_prburden < 0 ~ "negative",
cont_chg_prburden == 0 ~ "zero",
cont_chg_prburden > 0 ~ "positive" ,
TRUE ~ "NA")) %>%
# create rent burden change indicator - binary
mutate(bi_chg_prburden = case_when(cat_chg_prburden == "negative" ~ "loss",
cat_chg_prburden == "positive" ~ "gain",
TRUE ~ "NA")) %>%
glimpse()
在这个命令中我:
- 取我的数据集 (rburden_data2)
- 仅包含重要变量的子集
- 按人口普查区 (tractid) 分组
- 创建第一年和去年租金负担差异的连续指标
- 创建负变化、零变化和正变化的分类指标
- 创建研究期间收益或损失的二元指标
这是我尝试 运行 的函数,我在其中指定了相同的变量 prburden:
# function ++++++++++++++++++++
change_indicators <- function(data, var){
data %>%
# select only percent rent burden vars
select(tractid, year, contains("prburden")) %>%
# group by tractid and count # of tracts by group
group_by(tractid) %>%
# create rent burden change indicator - continuous
mutate("cont_chg_{{ var }} ":= "{{var}}"[year == 2019] - "{{var}}"[year == 2000]) %>%
# create rent burden change indicator - categorical
mutate("cat_chg_{{ var }}" := case_when("cont_chg_{{ var }}" < 0 ~ "negative",
"cont_chg_{{ var }}" == 0 ~ "zero",
"cont_chg_{{ var }}" > 0 ~ "positive" ,
TRUE ~ "NA")) %>%
# create rent burden change indicator - binary
mutate("bi_chg_{{ var }}" := case_when("cat_chg_{{ var }}" == "negative" ~ "loss",
"cat_chg_{{ var }}" == "positive" ~ "gain",
TRUE ~ "NA")) %>%
glimpse()
}
# test ++++++++++++++++++++
test <- change_indicators(data = rburden_data2,
var = prburden)
# error ++++++++++++++++++++
Error: Problem with `mutate()` column `cont_chg_prburden `.
ℹ `cont_chg_prburden = "{{var}}"[year == 2019] - "{{var}}"[year == 2000]`.
x non-numeric argument to binary operator
ℹ The error occurred in group 1: tractid = "01001020100".
我 运行 遇到的问题是,当我通过所有名称更改传递变量“prburden”,然后调用它来计算年份差异时。我对如何使用 {{}} 运算符感到很困惑,因为我认为我不需要在第一个 := 之后使用“”,但它会引发错误。
如果能帮助我将第一个代码块转换为可执行函数,我们将不胜感激。谢谢!
试试这个功能 -
library(dplyr)
change_indicators <- function(data, var){
val <- deparse(substitute(var))
col1 <- paste0('cont_chg_', val)
col2 <- paste0('cat_chg_', val)
col3 <- paste0('bi_chg_', val)
data %>%
# select only percent rent burden vars
select(tractid, year, contains("prburden")) %>%
# group by tractid and count # of tracts by group
group_by(tractid) %>%
# create rent burden change indicator - continuous
mutate(!! col1 := {{var}}[year == 2019] - {{var}}[year == 2000],
!! col2 := case_when(.data[[col1]] < 0 ~ "negative",
.data[[col1]] == 0 ~ "zero",
.data[[col1]] > 0 ~ "positive" ,
TRUE ~ NA_character_),
!!col3 := case_when(.data[[col2]] == "negative" ~ "loss",
.data[[col2]] == "positive" ~ "gain",
TRUE ~ NA_character_)) %>%
glimpse()
}
- 直接使用列名时使用
{{var}}
。 - 我不确定
"cont_chg_{{ var }}"
是否有效,我更喜欢在那里使用.data
代词。 - 要指定列名,请使用
!!name :=
创建新列。 - 将“NA”替换为
NA_character_
- 将所有内容合并为一个
mutate
调用。
data %>%
ungroup %>%
change_indicators(prburden)
#Rows: 219,246
#Columns: 10
#Groups: tractid [73,082]
#$ tractid <chr> "01001020100", "01001020100", "01001020100"…
#$ year <chr> "2000", "2013", "2019", "2000", "2013", "20…
#$ prburden_no <dbl> 60.73620, 67.88991, 44.64286, 46.07843, 42.…
#$ prburden <dbl> 14.110429, 13.761468, 35.119048, 16.666667,…
#$ prburden_sev <dbl> 17.177914, 18.348624, 18.452381, 26.470588,…
#$ prburden_not <dbl> 7.975460, 0.000000, 1.785714, 10.784314, 10…
#$ prburden_all <dbl> 31.28834, 32.11009, 53.57143, 43.13725, 46.…
#$ cont_chg_prburden <dbl> 21.008618, 21.008618, 21.008618, 7.457847, …
#$ cat_chg_prburden <chr> "positive", "positive", "positive", "positi…
#$ bi_chg_prburden <chr> "gain", "gain", "gain", "gain", "gain", "ga…
我发现最简单的方法是先定义新变量,将它们转换为符号,然后包含结果变量,如下所示:
change_indicators <- function(data, var){
#create new variable names first, and use the rlang::sym() function to convert from string to symbol
var_new <- sym(var)
cont_var <- sym(paste0("cont_chg_", var))
cat_var <- sym(paste0("cat_chg_", var))
bi_var <- sym(paste0("bi_chg_", var))
data %>%
# select only percent rent burden vars
select(tractid, year, contains("prburden")) %>%
# group by tractid and count # of tracts by group
group_by(tractid) %>%
# create rent burden change indicator - continuous
mutate({{cont_var}}:= {{var_new}}[year == 2019] - {{var_new}}[year == 2000]) %>%
# create rent burden change indicator - categorical
mutate({{cat_var}} := case_when({{cont_var}} < 0 ~ "negative",
{{cont_var}} == 0 ~ "zero",
{{cont_var}} > 0 ~ "positive" ,
TRUE ~ "NA")) %>%
# create rent burden change indicator - binary
mutate({{bi_var}} := case_when({{cat_var}} == "negative" ~ "loss",
{{cat_var}} == "positive" ~ "gain",
TRUE ~ "NA")) %>%
glimpse()
}
这是我对您的数据 运行 执行此操作时的输出,首先取消分组 df 后:
rburden_data2 <- read_rds("data/rburden_data2.rds") %>%
ungroup()
# use a quote here, because in previous comments you loop through a vector
test <- change_indicators(data = rburden_data2,
var = "prburden")
Rows: 219,246
Columns: 10
Groups: tractid [73,082]
$ tractid <chr> "01001020100", "01001020100", "01001020100", "01001020200", "…
$ year <chr> "2000", "2013", "2019", "2000", "2013", "2019", "2000", "2013…
$ prburden_no <dbl> 60.73620, 67.88991, 44.64286, 46.07843, 42.90429, 33.85214, 6…
$ prburden <dbl> 14.110429, 13.761468, 35.119048, 16.666667, 3.300330, 24.1245…
$ prburden_sev <dbl> 17.177914, 18.348624, 18.452381, 26.470588, 43.564356, 38.521…
$ prburden_not <dbl> 7.975460, 0.000000, 1.785714, 10.784314, 10.231023, 3.501946,…
$ prburden_all <dbl> 31.28834, 32.11009, 53.57143, 43.13725, 46.86469, 62.64591, 2…
$ cont_chg_prburden <dbl> 21.0086182, 21.0086182, 21.0086182, 7.4578470, 7.4578470, 7.4…
$ cat_chg_prburden <chr> "positive", "positive", "positive", "positive", "positive", "…
$ bi_chg_prburden <chr> "gain", "gain", "gain", "gain", "gain", "gain", "gain", "gain…
这避免了尝试在其他函数(如 case_when)中转换为符号的问题,这可能会令人头疼。