R:将最大的 n 行求和到一列
R: sum largest n rows into a column
我有一个如下所示的数据框:
df <- tibble(trans_id = c(1:5),
name = c('A', 'B', 'C', 'D', 'E'),
Yr2020 = c(100, 200, 300, 400, 500),
Yr2019 = c(10, 20, 30, 40, 50),
Yr2018 = c(1, 2, 3, 4, 5),
Yr2017 = c(1000, 2000, 3000, 4000, 5000),
Yr2016 = c(20,30,40,50,60),
Yr2015 = c(200,300,400,500,600),
Yr2014 = c(2000,3000,4000,5000,6000))
# A tibble: 5 x 9
trans_id name Yr2020 Yr2019 Yr2018 Yr2017 Yr2016 Yr2015 Yr2014
<int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 A 100 10 1 1000 20 200 2000
2 2 B 200 20 2 2000 30 300 3000
3 3 C 300 30 3 3000 40 400 4000
4 4 D 400 40 4 4000 50 500 5000
5 5 E 500 50 5 5000 60 600 6000
我想对 2019 年至 2014 年加上 2020 年的最大 4 个数字按行求和。
预期结果:
# A tibble: 5 x 3
trans_id name Top5
<int> <chr> <dbl>
1 1 A 3320
2 2 B 5530
3 3 C 7740
4 4 D 9950
5 5 E 12160
我通常在 Excel 中使用 SUM
和 LARGE
函数执行此操作,但我正在尝试加快一些手动任务的速度。
您可以试试这个base R
方法:
#Code
#Detect vars
index <- which(grepl(paste0(2014:2019,collapse = '|'),names(df)))
#Compute
df$Var <- apply(df[,index],1,function(x) sum(sort(x,decreasing =T)[1:4]))+df[['Yr2020']]
输出:
# A tibble: 5 x 10
trans_id name Yr2020 Yr2019 Yr2018 Yr2017 Yr2016 Yr2015 Yr2014 Var
<int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 A 100 10 1 1000 20 200 2000 3320
2 2 B 200 20 2 2000 30 300 3000 5530
3 3 C 300 30 3 3000 40 400 4000 7740
4 4 D 400 40 4 4000 50 500 5000 9950
5 5 E 500 50 5 5000 60 600 6000 12160
或者 dplyr
版本使用 c_across()
:
library(dplyr)
#Code
df %>% rowwise(trans_id) %>%
mutate(Sum=sum(head(sort(c_across(Yr2019:Yr2014),decreasing = T),4))+Yr2020)
输出:
# A tibble: 5 x 10
# Rowwise: trans_id
trans_id name Yr2020 Yr2019 Yr2018 Yr2017 Yr2016 Yr2015 Yr2014 Sum
<int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 A 100 10 1 1000 20 200 2000 3320
2 2 B 200 20 2 2000 30 300 3000 5530
3 3 C 300 30 3 3000 40 400 4000 7740
4 4 D 400 40 4 4000 50 500 5000 9950
5 5 E 500 50 5 5000 60 600 6000 12160
dplyr:
df %>% rowwise() %>%
mutate(top5 = Yr2020 + sum(sort(across(Yr2018:Yr2014), decreasing = T)[1:4]))
我有一个如下所示的数据框:
df <- tibble(trans_id = c(1:5),
name = c('A', 'B', 'C', 'D', 'E'),
Yr2020 = c(100, 200, 300, 400, 500),
Yr2019 = c(10, 20, 30, 40, 50),
Yr2018 = c(1, 2, 3, 4, 5),
Yr2017 = c(1000, 2000, 3000, 4000, 5000),
Yr2016 = c(20,30,40,50,60),
Yr2015 = c(200,300,400,500,600),
Yr2014 = c(2000,3000,4000,5000,6000))
# A tibble: 5 x 9
trans_id name Yr2020 Yr2019 Yr2018 Yr2017 Yr2016 Yr2015 Yr2014
<int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 A 100 10 1 1000 20 200 2000
2 2 B 200 20 2 2000 30 300 3000
3 3 C 300 30 3 3000 40 400 4000
4 4 D 400 40 4 4000 50 500 5000
5 5 E 500 50 5 5000 60 600 6000
我想对 2019 年至 2014 年加上 2020 年的最大 4 个数字按行求和。
预期结果:
# A tibble: 5 x 3
trans_id name Top5
<int> <chr> <dbl>
1 1 A 3320
2 2 B 5530
3 3 C 7740
4 4 D 9950
5 5 E 12160
我通常在 Excel 中使用 SUM
和 LARGE
函数执行此操作,但我正在尝试加快一些手动任务的速度。
您可以试试这个base R
方法:
#Code
#Detect vars
index <- which(grepl(paste0(2014:2019,collapse = '|'),names(df)))
#Compute
df$Var <- apply(df[,index],1,function(x) sum(sort(x,decreasing =T)[1:4]))+df[['Yr2020']]
输出:
# A tibble: 5 x 10
trans_id name Yr2020 Yr2019 Yr2018 Yr2017 Yr2016 Yr2015 Yr2014 Var
<int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 A 100 10 1 1000 20 200 2000 3320
2 2 B 200 20 2 2000 30 300 3000 5530
3 3 C 300 30 3 3000 40 400 4000 7740
4 4 D 400 40 4 4000 50 500 5000 9950
5 5 E 500 50 5 5000 60 600 6000 12160
或者 dplyr
版本使用 c_across()
:
library(dplyr)
#Code
df %>% rowwise(trans_id) %>%
mutate(Sum=sum(head(sort(c_across(Yr2019:Yr2014),decreasing = T),4))+Yr2020)
输出:
# A tibble: 5 x 10
# Rowwise: trans_id
trans_id name Yr2020 Yr2019 Yr2018 Yr2017 Yr2016 Yr2015 Yr2014 Sum
<int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 A 100 10 1 1000 20 200 2000 3320
2 2 B 200 20 2 2000 30 300 3000 5530
3 3 C 300 30 3 3000 40 400 4000 7740
4 4 D 400 40 4 4000 50 500 5000 9950
5 5 E 500 50 5 5000 60 600 6000 12160
dplyr:
df %>% rowwise() %>%
mutate(top5 = Yr2020 + sum(sort(across(Yr2018:Yr2014), decreasing = T)[1:4]))