按子类别划分的年度集团总数的百分比

Percentage of annual group total by subcategory

我正在尝试将数据框转换为年度总计和按子类别细分的百分比的汇总数据。例如,如果我有这个数据:

name year prod_type prod_color revenue
    a 2012       car        red    1000
    b 2012       car       blue    2000
    c 2012      boat        red    4000
    d 2012     plane       blue    5000
    a 2014      boat      green    9000
    b 2014       car        red    2000
    c 2014     plane       blue    6000
    a 2014     plane       blue   10000

我想创建一个如下所示的 table:

 name year yr_total_rev pct_car_rev pct_boat_rev pct_plane_rev pct_red_car_rev pct_blue_car_rev
1    a 2012         1000          NA           NA            NA              NA               NA
2    a 2014        19000          NA           NA            NA              NA               NA
3    b 2012         2000          NA           NA            NA              NA               NA
4    b 2014         2000          NA           NA            NA              NA               NA
5    c 2012         4000          NA           NA            NA              NA               NA
6    c 2014         6000          NA           NA            NA              NA               NA
7    d 2012         5000          NA           NA            NA              NA               NA

除了 NA 之外是每对 name/year 的 "yr_total_rev" 的百分比——即2012 年,汽车收入为 100%,但 2014 年为 0%,而船和飞机收入为 50%,等等

在此先感谢您提供的任何帮助!

示例数据如下:

df <- data.frame("name"=c(letters[1:4], c(letters[1:3], "a")), 
                 "year"=c(rep(2012,4), rep(2014, 4)),
                 "prod_type"=c("car","car","boat","plane","boat","car","plane","plane"),
                          "prod_color"=c("red","blue","red","blue","green","red","blue","blue"),
                 "revenue"=c(1000,2000,4000,5000,9000,2000,6000, 10000))

我在下面的代码中加入了三个单独的摘要:

library(tidyverse)

dat.summary = df %>% group_by(name, year) %>% 
  summarise(yr_total=sum(revenue)) %>% 
  left_join(df %>% group_by(name, year, prod_type) %>% 
      summarise(rev=sum(revenue)) %>% 
      group_by(name, year) %>% 
      mutate(Percent=rev/sum(rev)) %>%
      select(-rev) %>% 
      spread(prod_type, Percent)) %>% 
  left_join(df %>% group_by(name, year, prod_type, prod_color) %>% 
      summarise(rev=sum(revenue)) %>% 
      group_by(name, year) %>% 
      mutate(Percent=rev/sum(rev)) %>%
      unite(type_color, prod_type, prod_color) %>% 
      select(-rev) %>% 
      spread(type_color, Percent))
    name  year yr_total      boat   car     plane boat_green boat_red car_blue car_red plane_blue
1      a  2012     1000        NA     1        NA         NA       NA       NA       1         NA
2      a  2014    19000 0.4736842    NA 0.5263158  0.4736842       NA       NA      NA  0.5263158
3      b  2012     2000        NA     1        NA         NA       NA        1      NA         NA
4      b  2014     2000        NA     1        NA         NA       NA       NA       1         NA
5      c  2012     4000 1.0000000    NA        NA         NA        1       NA      NA         NA
6      c  2014     6000        NA    NA 1.0000000         NA       NA       NA      NA  1.0000000
7      d  2012     5000        NA    NA 1.0000000         NA       NA       NA      NA  1.0000000

这可以通过写一个函数来缩短一点:

fnc = function(...) {
  df %>% group_by(!!!quos(...)) %>% 
    summarise(rev=sum(revenue)) %>% 
    group_by(name, year) %>% 
    mutate(Percent=rev/sum(rev))
}

dat.summary = fnc(name, year) %>% select(-Percent) %>% 
  left_join(fnc(name, year, prod_type) %>%
              select(-rev) %>% 
              spread(prod_type, Percent)) %>% 
  left_join(fnc(name, year, prod_type, prod_color) %>%
              unite(type_color, prod_type, prod_color) %>% 
              select(-rev) %>% 
              spread(type_color, Percent))