按销售额和年份计算前 N 个产品

Calculate Top N products by sales and years

我有关于年销售额和副产品的数据,假设这样说:

Year <- c(2010,2010,2010,2010,2010,2011,2011,2011,2011,2011,2012,2012,2012,2012,2012)
Model <- c("a","b","c","d","e","a","b","c","d","e","a","b","c","d","e")
Sale <- c("30","45","23","33","24","11","56","19","45","56","33","32","89","33","12")
df <- data.frame(Year, Model, Sale)

年份产品:

a= 30+11+33 = 74
b= 45+56+32 = 133
c= 23+19+89 = 131
d= 33+45+33 = 111
e= 12+56+24 = 92

根据这3年的总销售额排名:

1 2 3 4 5 
b c d e a

我想要按年份标识前 2 种产品(根据这 3 年内的总销售额)并将所有其余产品汇总为类别“其他”的代码。所以输出应该是这样的:

year     Model          Sale
2010      b              45
2010      c              23
2010      other          30+33+24=92
2011      b              56
2011      c              19
2011      other          11+45+56=112
2012      b              32
2012      c              89
2012      other          33+33+12= 78

一个 tidyverse 解决方案。您的 Sale 数据似乎是以字符形式存储的,这意味着我们必须在对它们求和之前使用 as.numeric

library(tidyverse)

df %>% 
  group_by(Model) %>% 
  mutate(
    Sale = as.numeric(Sale),
    total_sale = sum(Sale)
  ) %>% 
  ungroup %>% 
  mutate(
    model_condensed = ifelse(total_sale %in% rev(sort(unique(total_sale)))[1:2], Model, 'other')
  ) %>% 
  group_by(Year, model_condensed) %>% 
  summarize(Sale = sum(Sale))

   Year model_condensed  Sale
  <dbl> <chr>           <dbl>
1  2010 b                  45
2  2010 c                  23
3  2010 other              87
4  2011 b                  56
5  2011 c                  19
6  2011 other             112
7  2012 b                  32
8  2012 c                  89
9  2012 other              78

上述解决方案通过匹配 Sale 中的值创建“其他”类别。如果这些值有小数位,这可能会导致问题(请参阅 this question)。相反,我们可以使用 two-step 过程按名称识别前两个模型,并使用它为总数据创建分组:

totals <- df %>% 
  group_by(Model) %>% 
  summarize(total_sale = sum(as.numeric(Sale))) %>% 
  arrange(desc(total_sale)) %>% 
  slice_head(n = 2)

df %>% 
  group_by(Year, model_condensed = ifelse(Model %in% totals$Model, Model, 'other')) %>% 
  summarize(Sale = sum(as.numeric(Sale)))