如何添加对值进行排名的列?
How to add a column that ranks values?
前段时间我问过类似的问题,但后来意识到我的问题实际上更复杂。不好意思再问了。
df <- data.frame(
comp_name = c("A","A","B","B","A","A","B","B","C","C","D","D","C","C","D","D"),
country = c("US","US","US","US","US","US","US","US","France","France","France","France","France","France","France","France"),
year = c("2018","2018","2018","2018","2019","2019","2019","2019","2018","2018","2018","2018","2019","2019","2019","2019"),
type = c("profit", "revenue","profit", "revenue","profit", "revenue","profit", "revenue","profit", "revenue","profit", "revenue","profit", "revenue","profit", "revenue"),
value = c(10,20,30,40,20,30,40,50,140,150,120,130,100,110,80,90)
)
df:
comp_name country year type value
1 A US 2018 profit 10
2 A US 2018 revenue 20
3 B US 2018 profit 30
4 B US 2018 revenue 40
5 A US 2019 profit 20
6 A US 2019 revenue 30
7 B US 2019 profit 40
8 B US 2019 revenue 50
9 C France 2018 profit 140
10 C France 2018 revenue 150
11 D France 2018 profit 120
12 D France 2018 revenue 130
13 C France 2019 profit 100
14 C France 2019 revenue 110
15 D France 2019 profit 80
16 D France 2019 revenue 90
我想像这样添加排名列:
comp_name country year type value rank
1 A US 2018 profit 10
2 A US 2018 revenue 20
3 B US 2018 profit 30
4 B US 2018 revenue 40
5 A US 2019 profit 20 2
6 A US 2019 revenue 30
7 B US 2019 profit 40 1
8 B US 2019 revenue 50
9 C France 2018 profit 140
10 C France 2018 revenue 150
11 D France 2018 profit 120
12 D France 2018 revenue 130
13 C France 2019 profit 100 1
14 C France 2019 revenue 110
15 D France 2019 profit 80 2
16 D France 2019 revenue 90
我只想考虑 2019 年的利润,并根据每个国家/地区的利润对公司进行排名。
当我之前问这个问题时,@KarthikS 提供了以下解决方案:
library(dplyr)
df %>% group_by(country) %>% mutate(rank = rank(desc(value)))
但是,我现在添加了更多变量(年份和类型),我也想考虑这些变量。
如果问题不清楚,请告诉我。我是 R 的新手,非常感谢任何帮助。
谢谢!
计算所有年份、所有类型、所有年份的排名,然后删除不需要的值。 (或保留它们。)
library(dplyr)
df %>%
group_by(country, year, type) %>%
mutate(rank = rank(desc(value))) %>%
ungroup() %>%
mutate(rank = if_else(year == 2019 & type == "profit", rank, NA_real_))
# # A tibble: 16 x 6
# comp_name country year type value rank
# <chr> <chr> <chr> <chr> <dbl> <dbl>
# 1 A US 2018 profit 10 NA
# 2 A US 2018 revenue 20 NA
# 3 B US 2018 profit 30 NA
# 4 B US 2018 revenue 40 NA
# 5 A US 2019 profit 20 2
# 6 A US 2019 revenue 30 NA
# 7 B US 2019 profit 40 1
# 8 B US 2019 revenue 50 NA
# 9 C France 2018 profit 140 NA
# 10 C France 2018 revenue 150 NA
# 11 D France 2018 profit 120 NA
# 12 D France 2018 revenue 130 NA
# 13 C France 2019 profit 100 1
# 14 C France 2019 revenue 110 NA
# 15 D France 2019 profit 80 2
# 16 D France 2019 revenue 90 NA
前段时间我问过类似的问题,但后来意识到我的问题实际上更复杂。不好意思再问了。
df <- data.frame(
comp_name = c("A","A","B","B","A","A","B","B","C","C","D","D","C","C","D","D"),
country = c("US","US","US","US","US","US","US","US","France","France","France","France","France","France","France","France"),
year = c("2018","2018","2018","2018","2019","2019","2019","2019","2018","2018","2018","2018","2019","2019","2019","2019"),
type = c("profit", "revenue","profit", "revenue","profit", "revenue","profit", "revenue","profit", "revenue","profit", "revenue","profit", "revenue","profit", "revenue"),
value = c(10,20,30,40,20,30,40,50,140,150,120,130,100,110,80,90)
)
df:
comp_name country year type value
1 A US 2018 profit 10
2 A US 2018 revenue 20
3 B US 2018 profit 30
4 B US 2018 revenue 40
5 A US 2019 profit 20
6 A US 2019 revenue 30
7 B US 2019 profit 40
8 B US 2019 revenue 50
9 C France 2018 profit 140
10 C France 2018 revenue 150
11 D France 2018 profit 120
12 D France 2018 revenue 130
13 C France 2019 profit 100
14 C France 2019 revenue 110
15 D France 2019 profit 80
16 D France 2019 revenue 90
我想像这样添加排名列:
comp_name country year type value rank
1 A US 2018 profit 10
2 A US 2018 revenue 20
3 B US 2018 profit 30
4 B US 2018 revenue 40
5 A US 2019 profit 20 2
6 A US 2019 revenue 30
7 B US 2019 profit 40 1
8 B US 2019 revenue 50
9 C France 2018 profit 140
10 C France 2018 revenue 150
11 D France 2018 profit 120
12 D France 2018 revenue 130
13 C France 2019 profit 100 1
14 C France 2019 revenue 110
15 D France 2019 profit 80 2
16 D France 2019 revenue 90
我只想考虑 2019 年的利润,并根据每个国家/地区的利润对公司进行排名。
当我之前问这个问题时,@KarthikS 提供了以下解决方案:
library(dplyr)
df %>% group_by(country) %>% mutate(rank = rank(desc(value)))
但是,我现在添加了更多变量(年份和类型),我也想考虑这些变量。
如果问题不清楚,请告诉我。我是 R 的新手,非常感谢任何帮助。 谢谢!
计算所有年份、所有类型、所有年份的排名,然后删除不需要的值。 (或保留它们。)
library(dplyr)
df %>%
group_by(country, year, type) %>%
mutate(rank = rank(desc(value))) %>%
ungroup() %>%
mutate(rank = if_else(year == 2019 & type == "profit", rank, NA_real_))
# # A tibble: 16 x 6
# comp_name country year type value rank
# <chr> <chr> <chr> <chr> <dbl> <dbl>
# 1 A US 2018 profit 10 NA
# 2 A US 2018 revenue 20 NA
# 3 B US 2018 profit 30 NA
# 4 B US 2018 revenue 40 NA
# 5 A US 2019 profit 20 2
# 6 A US 2019 revenue 30 NA
# 7 B US 2019 profit 40 1
# 8 B US 2019 revenue 50 NA
# 9 C France 2018 profit 140 NA
# 10 C France 2018 revenue 150 NA
# 11 D France 2018 profit 120 NA
# 12 D France 2018 revenue 130 NA
# 13 C France 2019 profit 100 1
# 14 C France 2019 revenue 110 NA
# 15 D France 2019 profit 80 2
# 16 D France 2019 revenue 90 NA