如何使用 1 个自变量和 3 个因变量创建计数和百分比表以及折线图

Question

我是一个 R 新手，这个问题似乎不难解决。但不幸的是，经过大约三天的搜索和试验，我仍然无法这样做。

我的数据格式接近宽格式：

color   agegroup    sex     ses
red     2           Female  A
blue    2           Female  C
green   5           Male    D
red     3           Female  A
red     2           Male    B
blue    1           Female  B
...

我正在尝试 创建 presentable tables 并组织因变量（color 此处）的计数和百分比通过 sex、ses 和 agegroup。我需要一个由 ses 和 sex 为每个 agegroup 组织的 table，百分比旁边有计数，如下所示：

agegroup:                                  1
sex:                  Female                               Male
ses:        A       B       C       D           A       B       C       D
color:
red         2 1%    0  0%   8 4%    22 11%      16 8%   2   1%  8   4%  3 1.5%
blue        9 4.5%  6  3%   4 2%    2  1%       12 6%   32 16%  14  7%  6   3%
green       4 2%    12 6%   2 1%    8  4%       0  0%   22 11%  40 20%  0   0%

agegroup:                               2
sex:                  Female                               Male
ses:        A       B       C       D           A       B       C       D
color:
red         2 1%    0  0%   8 4%    22 11%      16 8%   2   1%  8   4%  3 1.5%
blue        9 4.5%  6  3%   4 2%    2  1%       12 6%   32 16%  14  7%  6   3%
green       4 2%    12 6%   2 1%    8  4%       0  0%   22 11%  40 20%  0   0%

我一直在尝试对从 datatables 和 expss 到 gmodels 的所有内容执行此操作，但我就是不知道如何获得这样的输出。 CrossTables 与 gmodels 最接近，但仍然很远 -- (1) 它将百分比置于之下，(2) 我无法理解将 sel 嵌套在 sex 下，(3) 我无法弄清楚如何让它按代分解结果，并且 (4) 输出充满了破折号、垂直管道和空格使将其放入文字处理器或电子表格成为容易出错的手动操作。

编辑：我删除了我的第二个问题（关于线图），因为第一个问题的答案是完美的并且值得赞扬，即使它没有触及第二个问题。我会单独问第二个问题，因为我应该从一开始就问。

Answer 1

最接近expss包的结果：

library(expss)
# generate example data
set.seed(123)
N = 300
df = data.frame(
    color = sample(c("red", "blue", "green"), size = N, replace = TRUE),
    agegroup = sample(1:5, size = N, replace = TRUE),
    sex = sample(c("Male", "Female"), size = N, replace = TRUE),
    ses = sample(c("A", "B", "C", "D"),  size = N, replace = TRUE),
    stringsAsFactors = FALSE
)

# redirect output to RStudio HTML viewer
expss_output_viewer()
res = df %>% 
    tab_cells("|" = color) %>% # dependent variable, "|" used to suppress label
    tab_cols(sex %nest% ses) %>% # column variable
    tab_rows(agegroup) %>% 
    tab_total_row_position("none") %>% # we don't need total
    tab_stat_cases(label = "Cases") %>% # calculate cases
    tab_stat_cpct(label = "%") %>% # calculate percent
    tab_pivot(stat_position = "inside_columns") %>% # finalize table
    make_subheadings(number_of_columns = 2)

# difficult part - add percent sign
for(i in grep("%", colnames(res))){
    res[[i]] = ifelse(trimws(res[[i]])!="", 
                      paste0(round(res[[i]], 1), "%"),
                      res[[i]] 
                      )
}

# additionlly remove stat labels
colnames(res) = gsub("\|Cases|%", "", colnames(res), perl = TRUE)

res

在 RStudio 查看器中，结果将采用 HTML 格式（见图）。不幸的是，我无法测试如何将其粘贴到 MS Word 中。免责声明：我是 expss 包的作者。

Answer 2

您可以使用 janitor 包中的 adorn_ns(position = "front")。它会给你计数和百分比。

例如，这段代码：

df %>%  
arrange(desc(all)) %>%  
adorn_percentages("col") %>%
adorn_pct_formatting() %>% 
adorn_ns(position = "front") %>%
as.data.frame()

给出此输出：

如何使用 1 个自变量和 3 个因变量创建计数和百分比表以及折线图

How to create count and percentage tables and linegraphs with 1 independent variable and 3 dependent ones

r

expss