如何在给定的变量名列表上运行一组函数并将输出写入 table？

Question

我想要做的是运行 mean/standard 偏差计算，以及一组变量的统计测试。似乎正确的做法是构建函数，以便可以通过函数传递列名列表。

一个可能复杂的因素是对于这个特定的数据框，它需要与调查数据相关的某些功能。

library(radiant.data) #for weighted.sd
library(survey) #survey functions
library(srvyr) #survey functions

#building a df
df <- data.frame("GroupingFactor" = c(1, 1, 0, 0),
                 "VarofInterest1" = c(1, 1, 1, 0),
                 "VarofInterest2" = c(1, 0, 0, 0),
                 "PSU" = c(1, 2, 1, 2),
                 "SAMPWEIGHT" = c(0, 23254, 343, 5652),
                 "STRATA" = c(6133, 6131, 6145, 6152))

options(survey.adjust.domain.lonely=TRUE) #adjusting for the one PSU
options(survey.lonely.psu="adjust")

svy <- svydesign(~PSU, weights = ~SAMPWEIGHT, strata = ~STRATA, data = df, nest = TRUE, check.strata = FALSE) #the design

#here is what i would like to iterate

df %>% 
  group_by(GroupingFactor) %>% 
  summarise(mean = weighted.mean(VarofInterest1, SAMPWEIGHT, na.rm =T), sd = weighted.sd(VarofInterest1, SAMPWEIGHT, na.rm =T)) #for mean and SD

svychisq(~GroupingFactor+VarofInterest1, svy, statistic = 'Chisq') #the test of interest

创建 svy 对象后的所有内容都是我理想情况下在变量列表中自动化的内容，例如，应用于包括 VarofInterest2、VarofInterest3 等的列表。

最终产品是 table/tibble，包括所有变量名称、每个变量的均值和标准差以及卡方检验的输出（例如，检验 statistic/X-squared 和 p 值） .

我也会参考在非调查加权数据上执行此操作！（即，只是运行宁，比方说，十几个 t 检验使用类似的前提，即提供您想要的变量列表运行使用分组因子进行 t 检验）。

编辑：预期输出

GroupingFactor	Mean	SD	Statistic	p	Variable
0	.25	.25	341.14	.014	VarofInterest1
1	.50	.00	N/A	N/A	VarofInterest1

OR 单独的 functions/table 生成函数，其中一个只是 means/SDs:

GroupingFactor	Mean	SD	Variable
0	.50	.25	VarofInterest1
1	.25	.00	VarofInterest1

然后是测试统计数据和 p 值：

Variable	Statistic	p
VarofInterest1	4131.11	.001
VarofInterest2	131.14	.131

Answer 1

您可以编写一个函数 f() 来获取数据、组变量和感兴趣的变量，以及 return 统计数据。您需要修改以下示例以进行调查数据，但这可能会给你一个起点。

f <- function(df, g, v) {
  
  v_string = quo_name(enquo(v))
  g_string = quo_name(enquo(v))
  
  chi_result = chisq.test(df[[v_string]], df[[g_string]])
  
   df %>% 
    group_by({{g}}) %>% 
    summarize(Mean = mean({{v}}, na.rm=T),SD = sd({{v}}, na.rm=T)) %>% 
    mutate(variable=v_string,
           statistic=chi_result$statistic,
           pvalue=chi_result$p.value)
}


bind_rows(
  lapply(c("VarofInterest1", "VarofInterest2"),\(i) f(df,GroupingFactor,!!sym(i)))
)

输出：

# A tibble: 4 × 6
  GroupingFactor  Mean    SD variable       statistic pvalue
           <dbl> <dbl> <dbl> <chr>              <dbl>  <dbl>
1              0   0.5 0.707 VarofInterest1     0.444  0.505
2              1   1   0     VarofInterest1     0.444  0.505
3              0   0   0     VarofInterest2     0.444  0.505
4              1   0.5 0.707 VarofInterest2     0.444  0.505

如何在给定的变量名列表上运行一组函数并将输出写入 table？

How to run a set of functions over given list of variable names and write the output to a table?

statistics

r

function

tibble

如何在给定的变量名列表上 运行 一组函数并将输出写入 table？

How to run a set of functions over given list of variable names and write the output to a table?

statistics

r

function

tibble

如何在给定的变量名列表上运行一组函数并将输出写入 table？