R:将最大的 n 行求和到一列

R: sum largest n rows into a column

我有一个如下所示的数据框:

df <- tibble(trans_id = c(1:5),
             name = c('A', 'B', 'C', 'D', 'E'),
             Yr2020 = c(100, 200, 300, 400, 500),
             Yr2019 = c(10, 20, 30, 40, 50),
             Yr2018 = c(1, 2, 3, 4, 5),
             Yr2017 = c(1000, 2000, 3000, 4000, 5000),
             Yr2016 = c(20,30,40,50,60),
             Yr2015 = c(200,300,400,500,600),
             Yr2014 = c(2000,3000,4000,5000,6000))

# A tibble: 5 x 9
  trans_id name  Yr2020 Yr2019 Yr2018 Yr2017 Yr2016 Yr2015 Yr2014
     <int> <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1        1 A        100     10      1   1000     20    200   2000
2        2 B        200     20      2   2000     30    300   3000
3        3 C        300     30      3   3000     40    400   4000
4        4 D        400     40      4   4000     50    500   5000
5        5 E        500     50      5   5000     60    600   6000

我想对 2019 年至 2014 年加上 2020 年的最大 4 个数字按行求和。

预期结果:

# A tibble: 5 x 3
  trans_id name   Top5
     <int> <chr> <dbl>
1        1 A      3320
2        2 B      5530
3        3 C      7740
4        4 D      9950
5        5 E     12160

我通常在 Excel 中使用 SUMLARGE 函数执行此操作,但我正在尝试加快一些手动任务的速度。

您可以试试这个base R方法:

#Code
#Detect vars
index <- which(grepl(paste0(2014:2019,collapse = '|'),names(df)))
#Compute
df$Var <- apply(df[,index],1,function(x) sum(sort(x,decreasing =T)[1:4]))+df[['Yr2020']]

输出:

# A tibble: 5 x 10
  trans_id name  Yr2020 Yr2019 Yr2018 Yr2017 Yr2016 Yr2015 Yr2014   Var
     <int> <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <dbl>
1        1 A        100     10      1   1000     20    200   2000  3320
2        2 B        200     20      2   2000     30    300   3000  5530
3        3 C        300     30      3   3000     40    400   4000  7740
4        4 D        400     40      4   4000     50    500   5000  9950
5        5 E        500     50      5   5000     60    600   6000 12160

或者 dplyr 版本使用 c_across():

library(dplyr)
#Code
df %>% rowwise(trans_id) %>% 
  mutate(Sum=sum(head(sort(c_across(Yr2019:Yr2014),decreasing = T),4))+Yr2020)

输出:

# A tibble: 5 x 10
# Rowwise:  trans_id
  trans_id name  Yr2020 Yr2019 Yr2018 Yr2017 Yr2016 Yr2015 Yr2014   Sum
     <int> <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <dbl>
1        1 A        100     10      1   1000     20    200   2000  3320
2        2 B        200     20      2   2000     30    300   3000  5530
3        3 C        300     30      3   3000     40    400   4000  7740
4        4 D        400     40      4   4000     50    500   5000  9950
5        5 E        500     50      5   5000     60    600   6000 12160

dplyr:

df %>% rowwise() %>% 
  mutate(top5 = Yr2020 + sum(sort(across(Yr2018:Yr2014), decreasing = T)[1:4]))