如何将条件摘要的列和行添加到数据框（总数和百分比）？是否有使用 R 的管道工作流方法？

Question

出于演示目的，我经常需要使用列和行的总计和百分比来格式化数据框。

有条件地通过管道传输行总数和百分比非常简单：Whosebug e.g.

列总计可以整齐地传送：

选项 1：Whosebug e.g.

选项 2：使用 janitor 包功能 adorn_totals（但是我更愿意找到一种无需在我的工作流程中添加更多包的方法）。

我卡在了下一步，即在列总计下方添加一个列 % 行。此行计算列总和（列总计）占 table 总和（table 总计）的百分比。

在这里我必须拆分我的工作流程来执行以下操作：

创建一个table总变量
计算矢量百分比的函数
计算列百分比行
将列百分比行绑定到 table

这个过程感觉很繁琐，我相信有更好的方法；欢迎提出建议。

这就是我的目标

一旦生成 table 格式并进行整理以用于演示目的，我通常会使用 flextable 或 kableExtra 作为第二遍。

MWE

library(tidyverse)

tib <- tibble(v1 = c("a", "b", "c"),
              v2 = 1:3,
              v3 = 4:6)

# piping row summaries and column totals
tib <- 
  tib %>% 
  mutate(r_sum = rowSums(.[2:3]),
         r_pc = r_sum * 100/sum(r_sum)) %>% 
  bind_rows(summarise_all(., funs(if(is.numeric(.)) sum(.) else "Total")))


# extract gross total
table_total <- tib$r_sum[4]

# function to calculate percentage * 2 as tib includes a column total row
calc_pc <- function(x) {sum(x) * 100 / (table_total * 2)}

# calculate column percentages
col_pc <- 
  tib %>% 
  summarise_at(vars(v1:r_sum), funs(if(is.numeric(.)) calc_pc(.) else "Column %"))

# finally bringing it all together for the desired result
tib <- 
  tib %>% 
  bind_rows(col_pc)

Answer 1

使用janitor，一旦我们有了预先计算的总数，我们就可以做任何事情。

library(janitor, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)

tib <- tibble(v1 = c("a", "b", "c"), v2 = 1:3, v3 = 4:6)

total <- tib %>% select(where(is.numeric)) %>% sum

tib %>% 
  adorn_totals(c("row", "col")) %>% 
  rowwise() %>%
  mutate("Row %" = round(sum(across(where(is.numeric)))/total*50)) %>%
  ungroup %>%
  bind_rows(summarise(., across(where(is.numeric), ~round(sum(.)/total*50)))) %>%
  `[[<-`(nrow(.), 1, value = "Column %") %>%
  `[[<-`(nrow(.), ncol(.), value = NA)
#> # A tibble: 5 x 5
#>   v1          v2    v3 Total `Row %`
#>   <chr>    <dbl> <dbl> <dbl>   <dbl>
#> 1 a            1     4     5      24
#> 2 b            2     5     7      33
#> 3 c            3     6     9      43
#> 4 Total        6    15    21     100
#> 5 Column %    29    71   100      NA

^{由 reprex package (v0.3.0)}

于 2020-05-30 创建

或没有 janitor 时稍长一些：

library(dplyr, warn.conflicts = FALSE)

tib <- tibble(v1 = c("a", "b", "c"), v2 = 1:3, v3 = 4:6)

total <- tib %>% select(where(is.numeric)) %>% sum

tib %>% 
  rowwise() %>%
  mutate(
    Total = sum(across(where(is.numeric))),
    "Row %" = round(sum(across(where(is.numeric)))/total*50)
  ) %>%
  ungroup %>%
  bind_rows(summarise(., across(where(is.numeric), sum))) %>%
  `[[<-`(nrow(.), 1, value = "Total") %>%
  bind_rows(summarise(., across(where(is.numeric), ~round(sum(.)/total*50)))) %>%
  `[[<-`(nrow(.), 1, value = "Column %") %>%
  `[[<-`(nrow(.), ncol(.), value= NA)
#> # A tibble: 5 x 5
#>   v1          v2    v3 Total `Row %`
#>   <chr>    <dbl> <dbl> <dbl>   <dbl>
#> 1 a            1     4     5      24
#> 2 b            2     5     7      33
#> 3 c            3     6     9      43
#> 4 Total        6    15    21     100
#> 5 Column %    29    71   100      NA

^{由 reprex package (v0.3.0)}

于 2020-05-30 创建

当然，如果您不关心行名称，我可以将两者缩短一点。

如何将条件摘要的列和行添加到数据框（总数和百分比）？是否有使用 R 的管道工作流方法？

How do I add column and rows for conditional summaries to a data frame (totals and percentages)? is there a piped workflow method using R?

r

summary

conditional-formatting

tidyverse