带分组的 Cumsum table

Cumsum table with grouping

如何获得按性别和州分组的 Cumsum table?

Gender = sample(c('male', 'female'), 100, replace=TRUE)
State = sample(c('CA', 'WA', 'NV', 'OR', "AZ"), 100, replace=TRUE)
Number = sample(1:8, size=100, replace=TRUE)

df <- data.frame(Gender,State, Number)

对于更简单的方法,我建议使用 dplyr。当您加载 tidyverse 时,Dplyr 会与一堆其他有用的包一起加载。

library(tidyverse)

Gender = sample(c('male', 'female'), 100, replace=TRUE)
State = sample(c('CA', 'WA', 'NV', 'OR', "AZ"), 100, replace=TRUE)
Number = sample(1:8, size=100, replace=TRUE)

df <- data.frame(Gender,State, Number)

df <- df %>% 
  group_by(Gender, State) %>% 
  mutate(Number_CumSum = cumsum(Number)) %>% 
  ungroup() %>% 
  arrange(State, Gender)

head(df)

# A tibble: 6 x 4
  Gender  State Number Number_CumSum
  <fctr> <fctr>  <int>         <int>
1 female     AZ      8             8
2 female     AZ      3            11
3 female     AZ      4            15
4 female     AZ      5            20
5 female     AZ      2            22
6 female     AZ      7            29

如果我们正在寻找 cumsum table,那么

library(data.table)
dcast(setDT(df)[, .N, .(Gender, State, Number)
      ][, perc := round(100*N/sum(N), 2), .(Gender, State)],
     Gender + State ~Number, value.var = 'perc', fill = 0, drop = FALSE)[, 
     (3:10) := lapply(Reduce(`+`, .SD, accumulate = TRUE),
            function(x) paste0(x, "%")), .SDcols = -(1:2)][]