按销售额和年份计算前 N 个产品
Calculate Top N products by sales and years
我有关于年销售额和副产品的数据,假设这样说:
Year <- c(2010,2010,2010,2010,2010,2011,2011,2011,2011,2011,2012,2012,2012,2012,2012)
Model <- c("a","b","c","d","e","a","b","c","d","e","a","b","c","d","e")
Sale <- c("30","45","23","33","24","11","56","19","45","56","33","32","89","33","12")
df <- data.frame(Year, Model, Sale)
年份产品:
a= 30+11+33 = 74
b= 45+56+32 = 133
c= 23+19+89 = 131
d= 33+45+33 = 111
e= 12+56+24 = 92
根据这3年的总销售额排名:
1 2 3 4 5
b c d e a
我想要按年份标识前 2 种产品(根据这 3 年内的总销售额)并将所有其余产品汇总为类别“其他”的代码。所以输出应该是这样的:
year Model Sale
2010 b 45
2010 c 23
2010 other 30+33+24=92
2011 b 56
2011 c 19
2011 other 11+45+56=112
2012 b 32
2012 c 89
2012 other 33+33+12= 78
一个 tidyverse 解决方案。您的 Sale
数据似乎是以字符形式存储的,这意味着我们必须在对它们求和之前使用 as.numeric
。
library(tidyverse)
df %>%
group_by(Model) %>%
mutate(
Sale = as.numeric(Sale),
total_sale = sum(Sale)
) %>%
ungroup %>%
mutate(
model_condensed = ifelse(total_sale %in% rev(sort(unique(total_sale)))[1:2], Model, 'other')
) %>%
group_by(Year, model_condensed) %>%
summarize(Sale = sum(Sale))
Year model_condensed Sale
<dbl> <chr> <dbl>
1 2010 b 45
2 2010 c 23
3 2010 other 87
4 2011 b 56
5 2011 c 19
6 2011 other 112
7 2012 b 32
8 2012 c 89
9 2012 other 78
上述解决方案通过匹配 Sale
中的值创建“其他”类别。如果这些值有小数位,这可能会导致问题(请参阅 this question)。相反,我们可以使用 two-step 过程按名称识别前两个模型,并使用它为总数据创建分组:
totals <- df %>%
group_by(Model) %>%
summarize(total_sale = sum(as.numeric(Sale))) %>%
arrange(desc(total_sale)) %>%
slice_head(n = 2)
df %>%
group_by(Year, model_condensed = ifelse(Model %in% totals$Model, Model, 'other')) %>%
summarize(Sale = sum(as.numeric(Sale)))
我有关于年销售额和副产品的数据,假设这样说:
Year <- c(2010,2010,2010,2010,2010,2011,2011,2011,2011,2011,2012,2012,2012,2012,2012)
Model <- c("a","b","c","d","e","a","b","c","d","e","a","b","c","d","e")
Sale <- c("30","45","23","33","24","11","56","19","45","56","33","32","89","33","12")
df <- data.frame(Year, Model, Sale)
年份产品:
a= 30+11+33 = 74 b= 45+56+32 = 133 c= 23+19+89 = 131 d= 33+45+33 = 111 e= 12+56+24 = 92
根据这3年的总销售额排名:
1 2 3 4 5
b c d e a
我想要按年份标识前 2 种产品(根据这 3 年内的总销售额)并将所有其余产品汇总为类别“其他”的代码。所以输出应该是这样的:
year Model Sale 2010 b 45 2010 c 23 2010 other 30+33+24=92 2011 b 56 2011 c 19 2011 other 11+45+56=112 2012 b 32 2012 c 89 2012 other 33+33+12= 78
一个 tidyverse 解决方案。您的 Sale
数据似乎是以字符形式存储的,这意味着我们必须在对它们求和之前使用 as.numeric
。
library(tidyverse)
df %>%
group_by(Model) %>%
mutate(
Sale = as.numeric(Sale),
total_sale = sum(Sale)
) %>%
ungroup %>%
mutate(
model_condensed = ifelse(total_sale %in% rev(sort(unique(total_sale)))[1:2], Model, 'other')
) %>%
group_by(Year, model_condensed) %>%
summarize(Sale = sum(Sale))
Year model_condensed Sale
<dbl> <chr> <dbl>
1 2010 b 45
2 2010 c 23
3 2010 other 87
4 2011 b 56
5 2011 c 19
6 2011 other 112
7 2012 b 32
8 2012 c 89
9 2012 other 78
上述解决方案通过匹配 Sale
中的值创建“其他”类别。如果这些值有小数位,这可能会导致问题(请参阅 this question)。相反,我们可以使用 two-step 过程按名称识别前两个模型,并使用它为总数据创建分组:
totals <- df %>%
group_by(Model) %>%
summarize(total_sale = sum(as.numeric(Sale))) %>%
arrange(desc(total_sale)) %>%
slice_head(n = 2)
df %>%
group_by(Year, model_condensed = ifelse(Model %in% totals$Model, Model, 'other')) %>%
summarize(Sale = sum(as.numeric(Sale)))