如何为这个数据集添加排名列?
How to add a ranking column for this dataset?
我的数据如下:
df <- data.frame(
comp_name = c("A","B","C","D","E","F","G","H","J","K","L","M"),
country = c("US", "UK", "France", "Germany", "US", "UK", "France", "Germany", "US", "UK", "France", "Germany"),
profit = c(100,125,150,165,150,110,110,125,130,250,95,100)
)
df:
comp_name country profit
1 A US 100
2 B UK 125
3 C France 150
4 D Germany 165
5 E US 150
6 F UK 110
7 G France 110
8 H Germany 125
9 J US 130
10 K UK 250
11 L France 95
12 M Germany 100
我想在此数据框中添加一个排名列,按国家/地区的利润对公司进行排名,如下所示:
comp_name country profit rank
1 A US 100 3
2 B UK 125 2
3 C France 150 1
4 D Germany 165 1
5 E US 150 1
6 F UK 110 3
7 G France 110 2
8 H Germany 125 2
9 J US 130 2
10 K UK 250 1
11 L France 95 3
12 M Germany 100 3
我对 R 比较陌生,不知道从哪里开始。任何帮助将不胜感激。谢谢!
这个有用吗:
library(dplyr)
df %>% group_by(country) %>% mutate(rank = rank(desc(profit)))
# A tibble: 12 x 4
# Groups: country [4]
comp_name country profit rank
<chr> <chr> <dbl> <dbl>
1 A US 100 3
2 B UK 125 2
3 C France 150 1
4 D Germany 165 1
5 E US 150 1
6 F UK 110 3
7 G France 110 2
8 H Germany 125 2
9 J US 130 2
10 K UK 250 1
11 L France 95 3
12 M Germany 100 3
df %>%
dplyr::group_by(country) %>%
dplyr::group_map(function(x, y){
x %>% dplyr::mutate(rank = rank(-profit))
}) %>%
dplyr::bind_rows()
Karthik S 提供了更简洁的答案。
显然,这里的group_map是多余的
选项data.table
library(data.table)
setDT(df)[, Rank := frank(-profit), country]
使用 rank
+ ave
的基础 R 选项
transform(
df,
Rank = ave(-profit, country, FUN = rank)
)
给予
comp_name country profit Rank
1 A US 100 3
2 B UK 125 2
3 C France 150 1
4 D Germany 165 1
5 E US 150 1
6 F UK 110 3
7 G France 110 2
8 H Germany 125 2
9 J US 130 2
10 K UK 250 1
11 L France 95 3
12 M Germany 100 3
我的数据如下:
df <- data.frame(
comp_name = c("A","B","C","D","E","F","G","H","J","K","L","M"),
country = c("US", "UK", "France", "Germany", "US", "UK", "France", "Germany", "US", "UK", "France", "Germany"),
profit = c(100,125,150,165,150,110,110,125,130,250,95,100)
)
df:
comp_name country profit
1 A US 100
2 B UK 125
3 C France 150
4 D Germany 165
5 E US 150
6 F UK 110
7 G France 110
8 H Germany 125
9 J US 130
10 K UK 250
11 L France 95
12 M Germany 100
我想在此数据框中添加一个排名列,按国家/地区的利润对公司进行排名,如下所示:
comp_name country profit rank
1 A US 100 3
2 B UK 125 2
3 C France 150 1
4 D Germany 165 1
5 E US 150 1
6 F UK 110 3
7 G France 110 2
8 H Germany 125 2
9 J US 130 2
10 K UK 250 1
11 L France 95 3
12 M Germany 100 3
我对 R 比较陌生,不知道从哪里开始。任何帮助将不胜感激。谢谢!
这个有用吗:
library(dplyr)
df %>% group_by(country) %>% mutate(rank = rank(desc(profit)))
# A tibble: 12 x 4
# Groups: country [4]
comp_name country profit rank
<chr> <chr> <dbl> <dbl>
1 A US 100 3
2 B UK 125 2
3 C France 150 1
4 D Germany 165 1
5 E US 150 1
6 F UK 110 3
7 G France 110 2
8 H Germany 125 2
9 J US 130 2
10 K UK 250 1
11 L France 95 3
12 M Germany 100 3
df %>%
dplyr::group_by(country) %>%
dplyr::group_map(function(x, y){
x %>% dplyr::mutate(rank = rank(-profit))
}) %>%
dplyr::bind_rows()
Karthik S 提供了更简洁的答案。 显然,这里的group_map是多余的
选项data.table
library(data.table)
setDT(df)[, Rank := frank(-profit), country]
使用 rank
+ ave
transform(
df,
Rank = ave(-profit, country, FUN = rank)
)
给予
comp_name country profit Rank
1 A US 100 3
2 B UK 125 2
3 C France 150 1
4 D Germany 165 1
5 E US 150 1
6 F UK 110 3
7 G France 110 2
8 H Germany 125 2
9 J US 130 2
10 K UK 250 1
11 L France 95 3
12 M Germany 100 3