如何在函数内部将参数传递给 srvyr？

Question

所以我使用 srvyr 来计算来自调查对象的变量 (y) 的调查均值，按来自同一调查对象的分类变量 (x) 分组，基本代码如下所示

survey_means <- survey_object %>%
 filter( #remove NAs) %>%
 group_by(x) %>%
 summarise(Mean = survey_mean(y))

假设我想将此代码块放入一个函数中，该函数接受调查对象和两个变量作为参数。这是我实际尝试做的事情的简化版本，这是一个函数，最多可以处理一组 4 个左右的变量，但这是基本情况：

SurveyMeanFunc <- function(survey_object, x, y) {

survey_means <- survey_object %>%
 filter( #remove NAs ) %>%
 group_by(survey_object[["variables"]][[x]]) %>%
 summarise(Mean = survey_mean(survey_object[["variables"]][[y]]))
 
return(survey_means) 

}

当尝试使用此功能时，我总是会收到一条错误消息

! Assigned data `x` must be compatible with existing data.
x Existing data has n rows.
x Assigned data has m rows. (m > n)
i Only vectors of size 1 are recycled.

即使在使用 summarize 命令之前拆分管道并验证 x 中的行数与 y 中的行数相同时，我仍然会收到此消息。我不明白的 summarise() 是做什么的？

[编辑] 具有建议更改的完整上下文：

SurveyMeanMedFunc <- function(survey_obj, xvar, yvar, categ1= NULL, categ2= NULL) {
  
  if (is.null(categ1) & is.null(categ2)) {
    
    survey_estimate <- survey_obj %>%
      filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
      group_by({{ xvar }}) %>%
      summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
              
  } else if (is.null(categ2)) {
    
    survey_estimate <- survey_obj %>%
      filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
      group_by({{ xvar }}, {{ categ1 }}) %>%
      summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
    
  } else {
    
    NULL #fix
    
  }
  
  return(survey_estimate)
  
}

剩下的问题是，使用准引用来解决引用调查变量的问题适用于此 if-else 语句的顶层，但在下一个 else if 块中无法识别函数参数，即使它们是使用 {{}}

以相同方式处理

Answer 1

你没有举例说明你想如何使用这个函数，但如果我理解正确的话，你想把你的第一段代码运行用 x 替换为作为 x 参数传入的变量的名称， y 替换为作为 y 参数传入的变量的名称（仅限 'remove NAs' 行删除或修复做某事）

也就是说，您希望 SurveyMeanFunc(my_design, species, height) 成为

my_design %>%
 group_by(species) %>%
 summarise(Mean = survey_mean(height))

这很复杂，因为您不需要 x 的值或名称 x，您需要名称 species.

一种方法是准引用，它过去需要 enquo 和 !!，但现在可以使用 {{ }} 运算符

更轻松地完成

SurveyMeanFunc <- function(survey_object, x, y) {
survey_means <- survey_object %>%
 group_by({{ x }}) %>%
 summarise(Mean = survey_mean({{ y }}))
 survey_means
}

给予

> dstrata <- apistrat %>%
+   as_survey(strata = stype, weights = pw)
> 
> SurveyMeanFunc(dstrata, stype, api00)
# A tibble: 3 × 3
  stype  Mean Mean_se
  <fct> <dbl>   <dbl>
1 E      674.    12.5
2 H      626.    15.5
3 M      637.    16.6

更新

你仍然没有举例说明你想如何使用该功能，但我认为这可行

SurveyMeanMedFunc <- function(survey_obj, xvar, yvar, categ1, categ2) {
  
  if (missing(categ1) & missing(categ2)) {
    
    survey_estimate <- survey_obj %>%
      filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
      group_by({{ xvar }}) %>%
      summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
              
  } else if (missing(categ2)) {
    
    survey_estimate <- survey_obj %>%
      filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
      group_by({{ xvar }}, {{ categ1 }}) %>%
      summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
    
  } else {
    
   survey_estimate <- survey_obj %>%
      filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
      group_by({{ categ2 }}, {{ categ1 }}) %>%
      summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
    
  }
  
  return(survey_estimate)
  
}

问题是如果 categ1 或 categ2 是用户提供的，则无法在 if 条件下计算 categ1 或 categ2 ，因为您没有在调查对象中评估它们。 R 不知道去哪里找。这是一个问题，因为 tidyverse 使用不带引号的变量名的方式——如果您将它们作为模型公式（就像在 survey 中那样）或作为带引号的字符串提供，您就可以了。

missing 函数询问是否提供了参数，在本例中这就是您想要的。 rlang 包中有更灵活的 is_missing/maybe_missing 设置；您可以考虑另一种选择。但这似乎有效

> SurveyMeanMedFunc(dstrata,stype,enroll,sch.wide,comp.imp)
# A tibble: 4 × 5
# Groups:   comp.imp [2]
  comp.imp sch.wide  Mean Mean_low Mean_upp
  <fct>    <fct>    <dbl>    <dbl>    <dbl>
1 No       No       1013.     810.    1216.
2 No       Yes       525.     438.     611.
3 Yes      No        370.     207.     533.
4 Yes      Yes       521.     475.     566.
> SurveyMeanMedFunc(dstrata,stype,enroll,sch.wide)
# A tibble: 6 × 5
# Groups:   stype [3]
  stype sch.wide  Mean Mean_low Mean_upp
  <fct> <fct>    <dbl>    <dbl>    <dbl>
1 E     No        420.     340.     499.
2 E     Yes       417.     381.     452.
3 H     No       1520.    1209.    1830.
4 H     Yes      1137.     946.    1328.
5 M     No        967.     709.    1226.
6 M     Yes       775.     669.     881.
> SurveyMeanMedFunc(dstrata,stype,enroll)
# A tibble: 3 × 4
  stype  Mean Mean_low Mean_upp
  <fct> <dbl>    <dbl>    <dbl>
1 E      417.     384.     450.
2 H     1321.    1134.    1508.
3 M      832.     722.     943.

如何在函数内部将参数传递给 srvyr？

How do I pass arguments to srvyr inside of a function?

statistics

r

survey

dataframe

tibble