如何循环遍历多个列以生成多个交叉表

How to loop through several columns to generate multiple crosstabs

我正在尝试使用多个 y 变量和一个 x 变量创建一个包含多个交叉表的文档。每个 y 变量应该有一个单独的 table。我可以在 markdown 中对每个单独的交叉表执行此操作,并使用 kableextra 包生成 html tables。但是,我有几个变量,用一个循环来做会更容易。在 Stata 中,我会这样:

foreach i of varlist var1 var2 var3 {
  tab tab `i' year, row
}

我在使用 Stata 时遇到的问题是它没有在选项卡中应用频率权重。 R 确实在交叉表(descr 包)中应用了频率权重,并生成行和列百分比。

这是一个示例数据框:

structure(list(survey_yr = c(2019, 2020, 2019, 2020, 2019, 2020, 
2019, 2020, 2019, 2020, 2019, 2020, 2020, 2019, 2019, 2020, 2019, 
2020, 2019, 2020), Main_Data = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1), Survey_Month = c(6, 6, 7, 7, 7, 
7, 7, 7, 7, 7, 7, 7, 7, 9, 9, 9, 9, 9, 9, 9), Quarter = c(1, 
1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), Q1A_3L = c(3, 
1, 3, 3, 3, 1, 3, 3, 3, 3, 1, 2, 3, 3, 3, 3, 3, 1, 3, 2), Q1B_3L = c(3, 
1, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 2, 3, 2), Q1C_3L = c(3, 
1, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 2, 3, 2), Q1D_3L = c(3, 
2, 3, 3, 3, 1, 3, 3, 3, 2, 1, 3, 3, 3, 3, 3, 3, 1, 3, 2), Q1E_3L = c(3, 
2, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 2), Q1F_3L = c(3, 
3, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 2), Q1G_3L = c(3, 
3, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 2, 3, 2), Q1H_3L = c(3, 
2, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 2, 3, 2), Q1I_3L = c(3, 
3, 3, 3, 3, 1, 3, 3, 3, 2, 1, 2, 3, 3, 3, 3, 3, 1, 3, 2), Q1J_3L = c(3, 
3, 3, 3, 3, 1, 3, 3, 3, 2, 1, 3, 3, 3, 3, 3, 3, 1, 3, 2), Q1K_3L = c(3, 
3, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 2, 3, 2), Q1L_3L = c(3, 
3, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 1, 3, 2), Q1M_3L = c(3, 
2, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 2, 3, 2), Q1N_3L = c(3, 
2, 3, 3, 3, 1, 3, 3, 3, 2, 1, 3, 3, 3, 3, 1, 3, 3, 3, 2), Q1O_3L = c(3, 
2, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 1, 3, 2), Q1P_3L = c(3, 
2, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 1, 3, 2), Q1Q_3L = c(3, 
1, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 2, 3, 2), Q2_3L = c(3, 
2, 3, 3, 3, 1, 3, 3, 3, 3, 1, 1, 3, 3, 3, 3, 3, 1, 3, 2), weight = c(0.680000007152557, 
0.680000007152557, 0.823000013828278, 0.823000013828278, 0.823000013828278, 
0.823000013828278, 0.823000013828278, 0.823000013828278, 0.823000013828278, 
0.823000013828278, 1.27100002765656, 0.823000013828278, 0.823000013828278, 
0.823000013828278, 0.823000013828278, 0.823000013828278, 0.823000013828278, 
1.57599997520447, 0.823000013828278, 0.823000013828278)), row.names = c(4L, 
5L, 6L, 7L, 9L, 10L, 11L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 
23L, 26L, 27L, 28L, 31L, 33L), class = "data.frame")

我一直用于单个交叉表的代码生成了一个非常好的 table 是:

ct2=crosstab(dat$Q2_3L,dat$survey_yr, weight = dat$weight, 
       total.c = T, plot = F)
ct2_tab<- descr:::CreateNewTab(ct2)
class(ct2_tab)
kable(ct2_tab) %>%
  kable_classic(full_width = F, html_font = "Cambria")

我想找出一种方法来编写一个在多列上执行此操作的循环。这很接近,但我想要特定的列,而不是数据框中的所有列。其次,它将实际列名重命名为“col”,而我需要保留原始列名。最后,我不知道如何将其导出到 html、docx、excel 或任何类型的文档。

for (col in df) {
  ct_=crosstab(col, dat1$survey_yr, 
      weight = dat1$weight, format = "SPSS", prop.c = T, plot = F)
 print(ct_)

提前致谢。

我觉得你可以用

library(descr)

for (col in names(df)[5:22]) {
  ct_ <- crosstab(df[[col]], 
                  df[["survey_yr"]], 
                  weight = df[["weight"]], 
                  format = "SAS", 
                  prop.c = TRUE, 
                  plot = FALSE)
  ct_[["RowData"]] <- col
  ct_[["ColData"]] <- "survey_yr"
  print(ct_)
}

names(df)[5:22] 遍历像“Q1H_3”这样的列。这 returns 类似于

   Cell Contents 
|-------------------------|
|                       N | 
|           N / Col Total | 
|-------------------------|

===============================
          survey_yr
Q1Q_3L     2019    2020   Total
-------------------------------
1             1       2       3
          0.125   0.222        
-------------------------------
2             0       2       2
          0.000   0.222        
-------------------------------
3             7       5      12
          0.875   0.556        
-------------------------------
Total         8       9      17
          0.471   0.529        
===============================
   Cell Contents 
|-------------------------|
|                       N | 
|           N / Col Total | 
|-------------------------|

==============================
         survey_yr
Q2_3L     2019    2020   Total
------------------------------
1            1       3       4
         0.125   0.333        
------------------------------
2            0       2       2
         0.000   0.222        
------------------------------
3            7       4      11
         0.875   0.444        
------------------------------
Total        8       9      17
         0.471   0.529        
==============================

打印到 txt

您可以使用 sink():

将此输出保存到文件(例如 .txt 文件)
for (col in names(df)[5:22]) {
  sink(file = paste0(col, ".txt"))
  ct_ <- crosstab(df[[col]], 
                  df[["survey_yr"]], 
                  weight = df[["weight"]], 
                  format = "SAS", 
                  prop.c = TRUE, 
                  plot = FALSE)
  ct_[["RowData"]] <- col
  ct_[["ColData"]] <- "survey_yr"
  print(ct_)
  sink()
}

这会在您当前的工作目录中创建多个文件,例如 Q1A_3L.txt 和 Q1B_3L.txt。