在 mutate 函数中包含运算符

embracing operator inside mutate function

我正在尝试编写一个函数,我经常在我的论文中使用它,但很难达到 运行。

代码有效,但一旦我 运行 函数失败,我认为,因为 R 如何通过包含函数选项读取指定变量。这是一个变量 prburden 和 link to sample data:

的成功代码
 rburden_data2 %>%
  
  # select only percent rent burden vars
  select(tractid, year, CBSA_name, contains("prburden")) %>%
  
  # group by tractid and count # of tracts by group 
  group_by(tractid) %>% 
  
  # create rent burden change indicator - continuous
  mutate(cont_chg_prburden = prburden[year == "2019"] - prburden[year == "2000"] ) %>%
  
  # create rent burden change indicator - categorical
  mutate(cat_chg_prburden = case_when(cont_chg_prburden   < 0 ~ "negative",
                                      cont_chg_prburden  == 0 ~ "zero",
                                      cont_chg_prburden   > 0  ~ "positive" ,
                                      TRUE ~ "NA")) %>%
  
  
  # create rent burden change indicator - binary
  mutate(bi_chg_prburden = case_when(cat_chg_prburden == "negative" ~ "loss",
                                     cat_chg_prburden == "positive" ~ "gain",
                                     TRUE ~ "NA")) %>%
  glimpse()

在这个命令中我:

这是我尝试 运行 的函数,我在其中指定了相同的变量 prburden:

# function  ++++++++++++++++++++
change_indicators <- function(data, var){
  
  data %>%
    
    # select only percent rent burden vars
    select(tractid, year, contains("prburden")) %>%
    
    # group by tractid and count # of tracts by group 
    group_by(tractid) %>% 
    
    # create rent burden change indicator - continuous
    mutate("cont_chg_{{ var }} ":= "{{var}}"[year == 2019] - "{{var}}"[year == 2000]) %>%
    
    # create rent burden change indicator - categorical
    mutate("cat_chg_{{ var }}" := case_when("cont_chg_{{ var }}"    < 0 ~ "negative",
                                             "cont_chg_{{ var }}"   == 0 ~ "zero",
                                             "cont_chg_{{ var }}"    > 0  ~ "positive" ,
                                             TRUE ~ "NA")) %>%
    
    
    # create rent burden change indicator - binary
    mutate("bi_chg_{{ var }}" := case_when("cat_chg_{{ var }}" == "negative" ~ "loss",
                                           "cat_chg_{{ var }}" == "positive" ~ "gain",
                                           TRUE ~ "NA")) %>%
    
    
    
    glimpse() 
  
}

# test ++++++++++++++++++++
test <-  change_indicators(data = rburden_data2, 
                           var = prburden)  

# error ++++++++++++++++++++
Error: Problem with `mutate()` column `cont_chg_prburden `.
ℹ `cont_chg_prburden  = "{{var}}"[year == 2019] - "{{var}}"[year == 2000]`.
x non-numeric argument to binary operator
ℹ The error occurred in group 1: tractid = "01001020100".

我 运行 遇到的问题是,当我通过所有名称更改传递变量“prburden”,然后调用它来计算年份差异时。我对如何使用 {{}} 运算符感到很困惑,因为我认为我不需要在第一个 := 之后使用“”,但它会引发错误。

如果能帮助我将第一个代码块转换为可执行函数,我们将不胜感激。谢谢!

试试这个功能 -

library(dplyr)

change_indicators <- function(data, var){
  val <- deparse(substitute(var))
  col1 <- paste0('cont_chg_', val)
  col2 <- paste0('cat_chg_', val)
  col3 <- paste0('bi_chg_', val)
  
  data %>%
    # select only percent rent burden vars
    select(tractid, year, contains("prburden")) %>%
    # group by tractid and count # of tracts by group 
    group_by(tractid) %>% 
    # create rent burden change indicator - continuous
    mutate(!! col1 := {{var}}[year == 2019] - {{var}}[year == 2000],
           !! col2  := case_when(.data[[col1]]    < 0 ~ "negative",
                                .data[[col1]]   == 0 ~ "zero",
                                .data[[col1]]    > 0  ~ "positive" ,
                                TRUE ~ NA_character_), 
           !!col3 := case_when(.data[[col2]] == "negative" ~ "loss",
                                .data[[col2]] == "positive" ~ "gain",
                                TRUE ~ NA_character_)) %>%
    glimpse() 
  
}
  • 直接使用列名时使用{{var}}
  • 我不确定 "cont_chg_{{ var }}" 是否有效,我更喜欢在那里使用 .data 代词。
  • 要指定列名,请使用 !!name := 创建新列。
  • 将“NA”替换为 NA_character_
  • 将所有内容合并为一个 mutate 调用。
data %>%
  ungroup %>%
  change_indicators(prburden)

#Rows: 219,246
#Columns: 10
#Groups: tractid [73,082]
#$ tractid           <chr> "01001020100", "01001020100", "01001020100"…
#$ year              <chr> "2000", "2013", "2019", "2000", "2013", "20…
#$ prburden_no       <dbl> 60.73620, 67.88991, 44.64286, 46.07843, 42.…
#$ prburden          <dbl> 14.110429, 13.761468, 35.119048, 16.666667,…
#$ prburden_sev      <dbl> 17.177914, 18.348624, 18.452381, 26.470588,…
#$ prburden_not      <dbl> 7.975460, 0.000000, 1.785714, 10.784314, 10…
#$ prburden_all      <dbl> 31.28834, 32.11009, 53.57143, 43.13725, 46.…
#$ cont_chg_prburden <dbl> 21.008618, 21.008618, 21.008618, 7.457847, …
#$ cat_chg_prburden  <chr> "positive", "positive", "positive", "positi…
#$ bi_chg_prburden   <chr> "gain", "gain", "gain", "gain", "gain", "ga…

我发现最简单的方法是先定义新变量,将它们转换为符号,然后包含结果变量,如下所示:

  change_indicators <- function(data, var){
  #create new variable names first, and use the rlang::sym() function to convert from string to symbol
  var_new <- sym(var)
  cont_var <- sym(paste0("cont_chg_", var))
  cat_var <- sym(paste0("cat_chg_", var))
  bi_var <- sym(paste0("bi_chg_", var))
  
  data %>%
    
    # select only percent rent burden vars
    select(tractid, year, contains("prburden")) %>%
    
    # group by tractid and count # of tracts by group 
    group_by(tractid) %>% 
    
    # create rent burden change indicator - continuous
    mutate({{cont_var}}:= {{var_new}}[year == 2019] - {{var_new}}[year == 2000]) %>%
    
    # create rent burden change indicator - categorical
    mutate({{cat_var}} := case_when({{cont_var}} < 0 ~ "negative",
                                            {{cont_var}} == 0 ~ "zero",
                                            {{cont_var}} > 0  ~ "positive" ,
                                            TRUE ~ "NA")) %>%
    # create rent burden change indicator - binary
    mutate({{bi_var}} := case_when({{cat_var}} == "negative" ~ "loss",
                                           {{cat_var}} == "positive" ~ "gain",
                                           TRUE ~ "NA")) %>%
    
    
    
    glimpse() 
  
}

这是我对您的数据 运行 执行此操作时的输出,首先取消分组 df 后:

rburden_data2 <- read_rds("data/rburden_data2.rds") %>% 
  ungroup()

# use a quote here, because in previous comments you loop through a vector
test <-  change_indicators(data = rburden_data2, 
                           var = "prburden")  

Rows: 219,246
Columns: 10
Groups: tractid [73,082]
$ tractid           <chr> "01001020100", "01001020100", "01001020100", "01001020200", "…
$ year              <chr> "2000", "2013", "2019", "2000", "2013", "2019", "2000", "2013…
$ prburden_no       <dbl> 60.73620, 67.88991, 44.64286, 46.07843, 42.90429, 33.85214, 6…
$ prburden          <dbl> 14.110429, 13.761468, 35.119048, 16.666667, 3.300330, 24.1245…
$ prburden_sev      <dbl> 17.177914, 18.348624, 18.452381, 26.470588, 43.564356, 38.521…
$ prburden_not      <dbl> 7.975460, 0.000000, 1.785714, 10.784314, 10.231023, 3.501946,…
$ prburden_all      <dbl> 31.28834, 32.11009, 53.57143, 43.13725, 46.86469, 62.64591, 2…
$ cont_chg_prburden <dbl> 21.0086182, 21.0086182, 21.0086182, 7.4578470, 7.4578470, 7.4…
$ cat_chg_prburden  <chr> "positive", "positive", "positive", "positive", "positive", "…
$ bi_chg_prburden   <chr> "gain", "gain", "gain", "gain", "gain", "gain", "gain", "gain…

这避免了尝试在其他函数(如 case_when)中转换为符号的问题,这可能会令人头疼。