R使用将成为不同分母的单元格在数据框中创建汇总百分比

R create summary percentages in a dataframe using cells that will become different denominators

这是我的数据框。它不是很长 - 只有六行。

df <- structure(list(Send_Month = c("2021-05", "2021-06", "2021-07", 
"2021-05", "2021-06", "2021-07"), Order_Result = c("No", "No", 
"No", "Yes", "Yes", "Yes"), Email_Send = c(135, 495, 475, 7, 
28, 25), Unique_Email_Opens = c(45, 149, 143, 7, 28, 25), Unique_Email_Clicks = c(6, 
21, 10, 7, 28, 25), Total_Orders = c(37, 106, 46, 7, 28, 25)), row.names = c(NA, 
-6L), groups = structure(list(Send_Month = c("2021-05", "2021-06", 
"2021-07"), .rows = structure(list(c(1L, 4L), c(2L, 5L), c(3L, 
6L)), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr", 
"list"))), row.names = c(NA, -3L), class = c("tbl_df", "tbl", 
"data.frame"), .drop = TRUE), class = c("grouped_df", "tbl_df", 
"tbl", "data.frame"))

我无法想象如何获得可以绘制成条形图的汇总结果。我想在这里做一些分组:

当月份相同(例如“2021-05”)并且我查看 Email_Send 变量时,我可以看到 142 封电子邮件中有 7 封(即 135+7)被发送导致订单。我还可以看到打开的 52 封电子邮件中有 7 封(即 45+7)导致了订单。在点击的 13 封电子邮件中,有 7 封(即 6+7)导致了订单。那是给“2021-05”组的。

如何为每个分组创建这些统计信息,以便我可以看到每个组的百分比如何变化,其中分母不断变化?

我尝试使用 janitor 包一秒钟,只是为了调整自己的方向,我首先过滤以仅包含 2021-05 组:

df_may <- df %>%
  filter(Send_Month == "2021-05")

df_may %>%
  adorn_totals("row")

但我不知道这种方法对于一起查看所有组是否非常灵活,而且我也不知道我是否真的想要一个摘要行或一个新列。所以我不知道我的方向是否正确。

更新

如果不想输出list格式的,可​​以试试

df %>%
  group_by(Send_Month) %>%
  mutate(across(Email_Send:Total_Orders, proportions)) %>%
  ungroup()

这给出了

  Send_Month Order_Result Email_Send Unique_Email_Opens Unique_Email_Clicks
  <chr>      <chr>             <dbl>              <dbl>               <dbl>
1 2021-05    No               0.951               0.865               0.462
2 2021-06    No               0.946               0.842               0.429
3 2021-07    No               0.95                0.851               0.286
4 2021-05    Yes              0.0493              0.135               0.538
5 2021-06    Yes              0.0535              0.158               0.571
6 2021-07    Yes              0.05                0.149               0.714
# ... with 1 more variable: Total_Orders <dbl>

也许你可以试试下面的代码

> lapply(split(df, df$Send_Month), function(x) {x[-(1:2)]<-proportions(as.matrix(x[-(1:2)]), 2);x})
$`2021-05`
# A tibble: 2 x 6
# Groups:   Send_Month [1]
  Send_Month Order_Result Email_Send Unique_Email_Opens Unique_Email_Clicks
  <chr>      <chr>             <dbl>              <dbl>               <dbl>
1 2021-05    No               0.951               0.865               0.462
2 2021-05    Yes              0.0493              0.135               0.538
# ... with 1 more variable: Total_Orders <dbl>

$`2021-06`
# A tibble: 2 x 6
# Groups:   Send_Month [1]
  Send_Month Order_Result Email_Send Unique_Email_Opens Unique_Email_Clicks
  <chr>      <chr>             <dbl>              <dbl>               <dbl>
1 2021-06    No               0.946               0.842               0.429
2 2021-06    Yes              0.0535              0.158               0.571
# ... with 1 more variable: Total_Orders <dbl>

$`2021-07`
# A tibble: 2 x 6
# Groups:   Send_Month [1]
  Send_Month Order_Result Email_Send Unique_Email_Opens Unique_Email_Clicks
  <chr>      <chr>             <dbl>              <dbl>               <dbl>
1 2021-07    No                 0.95              0.851               0.286
2 2021-07    Yes                0.05              0.149               0.714
# ... with 1 more variable: Total_Orders <dbl>

感谢亲爱的@ThomasIsCoding 提供了使用 proporstions 函数代替 .x/sum(.x) 的绝妙提示。

library(dplyr)
library(purrr)

df %>%
  group_by(Send_Month, .add = TRUE) %>%
  group_split() %>%
  map(~ .x %>% 
        mutate(across(!c(1, 2), ~ proportions(.x))))

[[1]]
# A tibble: 2 x 6
  Send_Month Order_Result Email_Send Unique_Email_Opens Unique_Email_Clicks Total_Orders
  <chr>      <chr>             <dbl>              <dbl>               <dbl>        <dbl>
1 2021-05    No               0.951               0.865               0.462        0.841
2 2021-05    Yes              0.0493              0.135               0.538        0.159

[[2]]
# A tibble: 2 x 6
  Send_Month Order_Result Email_Send Unique_Email_Opens Unique_Email_Clicks Total_Orders
  <chr>      <chr>             <dbl>              <dbl>               <dbl>        <dbl>
1 2021-06    No               0.946               0.842               0.429        0.791
2 2021-06    Yes              0.0535              0.158               0.571        0.209

[[3]]
# A tibble: 2 x 6
  Send_Month Order_Result Email_Send Unique_Email_Opens Unique_Email_Clicks Total_Orders
  <chr>      <chr>             <dbl>              <dbl>               <dbl>        <dbl>
1 2021-07    No                 0.95              0.851               0.286        0.648
2 2021-07    Yes                0.05              0.149               0.714        0.352

对于看门人,假设您想从百分比计算中免除列 Total_Orders

library(janitor)
library(tidyverse)

split(df, df$Send_Month) %>%
  map_df(adorn_percentages, "col", TRUE, -c(1, 2, Total_Orders))

# A tibble: 6 x 6
# Groups:   Send_Month [3]
  Send_Month Order_Result Email_Send Unique_Email_Opens Unique_Email_Clicks Total_Orders
  <chr>      <chr>             <dbl>              <dbl>               <dbl>        <dbl>
1 2021-05    No               0.951               0.865               0.462           37
2 2021-05    Yes              0.0493              0.135               0.538            7
3 2021-06    No               0.946               0.842               0.429          106
4 2021-06    Yes              0.0535              0.158               0.571           28
5 2021-07    No               0.95                0.851               0.286           46
6 2021-07    Yes              0.05                0.149               0.714           25