按子类别划分的年度集团总数的百分比
Percentage of annual group total by subcategory
我正在尝试将数据框转换为年度总计和按子类别细分的百分比的汇总数据。例如,如果我有这个数据:
name year prod_type prod_color revenue
a 2012 car red 1000
b 2012 car blue 2000
c 2012 boat red 4000
d 2012 plane blue 5000
a 2014 boat green 9000
b 2014 car red 2000
c 2014 plane blue 6000
a 2014 plane blue 10000
我想创建一个如下所示的 table:
name year yr_total_rev pct_car_rev pct_boat_rev pct_plane_rev pct_red_car_rev pct_blue_car_rev
1 a 2012 1000 NA NA NA NA NA
2 a 2014 19000 NA NA NA NA NA
3 b 2012 2000 NA NA NA NA NA
4 b 2014 2000 NA NA NA NA NA
5 c 2012 4000 NA NA NA NA NA
6 c 2014 6000 NA NA NA NA NA
7 d 2012 5000 NA NA NA NA NA
除了 NA 之外是每对 name/year 的 "yr_total_rev" 的百分比——即2012 年,汽车收入为 100%,但 2014 年为 0%,而船和飞机收入为 50%,等等
在此先感谢您提供的任何帮助!
示例数据如下:
df <- data.frame("name"=c(letters[1:4], c(letters[1:3], "a")),
"year"=c(rep(2012,4), rep(2014, 4)),
"prod_type"=c("car","car","boat","plane","boat","car","plane","plane"),
"prod_color"=c("red","blue","red","blue","green","red","blue","blue"),
"revenue"=c(1000,2000,4000,5000,9000,2000,6000, 10000))
我在下面的代码中加入了三个单独的摘要:
library(tidyverse)
dat.summary = df %>% group_by(name, year) %>%
summarise(yr_total=sum(revenue)) %>%
left_join(df %>% group_by(name, year, prod_type) %>%
summarise(rev=sum(revenue)) %>%
group_by(name, year) %>%
mutate(Percent=rev/sum(rev)) %>%
select(-rev) %>%
spread(prod_type, Percent)) %>%
left_join(df %>% group_by(name, year, prod_type, prod_color) %>%
summarise(rev=sum(revenue)) %>%
group_by(name, year) %>%
mutate(Percent=rev/sum(rev)) %>%
unite(type_color, prod_type, prod_color) %>%
select(-rev) %>%
spread(type_color, Percent))
name year yr_total boat car plane boat_green boat_red car_blue car_red plane_blue
1 a 2012 1000 NA 1 NA NA NA NA 1 NA
2 a 2014 19000 0.4736842 NA 0.5263158 0.4736842 NA NA NA 0.5263158
3 b 2012 2000 NA 1 NA NA NA 1 NA NA
4 b 2014 2000 NA 1 NA NA NA NA 1 NA
5 c 2012 4000 1.0000000 NA NA NA 1 NA NA NA
6 c 2014 6000 NA NA 1.0000000 NA NA NA NA 1.0000000
7 d 2012 5000 NA NA 1.0000000 NA NA NA NA 1.0000000
这可以通过写一个函数来缩短一点:
fnc = function(...) {
df %>% group_by(!!!quos(...)) %>%
summarise(rev=sum(revenue)) %>%
group_by(name, year) %>%
mutate(Percent=rev/sum(rev))
}
dat.summary = fnc(name, year) %>% select(-Percent) %>%
left_join(fnc(name, year, prod_type) %>%
select(-rev) %>%
spread(prod_type, Percent)) %>%
left_join(fnc(name, year, prod_type, prod_color) %>%
unite(type_color, prod_type, prod_color) %>%
select(-rev) %>%
spread(type_color, Percent))
我正在尝试将数据框转换为年度总计和按子类别细分的百分比的汇总数据。例如,如果我有这个数据:
name year prod_type prod_color revenue
a 2012 car red 1000
b 2012 car blue 2000
c 2012 boat red 4000
d 2012 plane blue 5000
a 2014 boat green 9000
b 2014 car red 2000
c 2014 plane blue 6000
a 2014 plane blue 10000
我想创建一个如下所示的 table:
name year yr_total_rev pct_car_rev pct_boat_rev pct_plane_rev pct_red_car_rev pct_blue_car_rev
1 a 2012 1000 NA NA NA NA NA
2 a 2014 19000 NA NA NA NA NA
3 b 2012 2000 NA NA NA NA NA
4 b 2014 2000 NA NA NA NA NA
5 c 2012 4000 NA NA NA NA NA
6 c 2014 6000 NA NA NA NA NA
7 d 2012 5000 NA NA NA NA NA
除了 NA 之外是每对 name/year 的 "yr_total_rev" 的百分比——即2012 年,汽车收入为 100%,但 2014 年为 0%,而船和飞机收入为 50%,等等
在此先感谢您提供的任何帮助!
示例数据如下:
df <- data.frame("name"=c(letters[1:4], c(letters[1:3], "a")),
"year"=c(rep(2012,4), rep(2014, 4)),
"prod_type"=c("car","car","boat","plane","boat","car","plane","plane"),
"prod_color"=c("red","blue","red","blue","green","red","blue","blue"),
"revenue"=c(1000,2000,4000,5000,9000,2000,6000, 10000))
我在下面的代码中加入了三个单独的摘要:
library(tidyverse)
dat.summary = df %>% group_by(name, year) %>%
summarise(yr_total=sum(revenue)) %>%
left_join(df %>% group_by(name, year, prod_type) %>%
summarise(rev=sum(revenue)) %>%
group_by(name, year) %>%
mutate(Percent=rev/sum(rev)) %>%
select(-rev) %>%
spread(prod_type, Percent)) %>%
left_join(df %>% group_by(name, year, prod_type, prod_color) %>%
summarise(rev=sum(revenue)) %>%
group_by(name, year) %>%
mutate(Percent=rev/sum(rev)) %>%
unite(type_color, prod_type, prod_color) %>%
select(-rev) %>%
spread(type_color, Percent))
name year yr_total boat car plane boat_green boat_red car_blue car_red plane_blue 1 a 2012 1000 NA 1 NA NA NA NA 1 NA 2 a 2014 19000 0.4736842 NA 0.5263158 0.4736842 NA NA NA 0.5263158 3 b 2012 2000 NA 1 NA NA NA 1 NA NA 4 b 2014 2000 NA 1 NA NA NA NA 1 NA 5 c 2012 4000 1.0000000 NA NA NA 1 NA NA NA 6 c 2014 6000 NA NA 1.0000000 NA NA NA NA 1.0000000 7 d 2012 5000 NA NA 1.0000000 NA NA NA NA 1.0000000
这可以通过写一个函数来缩短一点:
fnc = function(...) {
df %>% group_by(!!!quos(...)) %>%
summarise(rev=sum(revenue)) %>%
group_by(name, year) %>%
mutate(Percent=rev/sum(rev))
}
dat.summary = fnc(name, year) %>% select(-Percent) %>%
left_join(fnc(name, year, prod_type) %>%
select(-rev) %>%
spread(prod_type, Percent)) %>%
left_join(fnc(name, year, prod_type, prod_color) %>%
unite(type_color, prod_type, prod_color) %>%
select(-rev) %>%
spread(type_color, Percent))