R:选择不同的原始并分离到新的排名集
R: choose different raw and separate to new ranked set
我的数据集如下所示:
Interest Age Gender Scored.Probabilities
AL008 18-24 male 0.211
AL024 25-34 male 0.022
AL008 35-44 female 0.102
AL008 25-34 female 0.002
AL024 13-17 male 0.102
AL035 35-44 female 0.027
AL024 35-44 female 0.051
AL024 55-64 male 0.025
AL024 35-44 male 0.016
AL034 45-54 male 0.021
AL036 35-44 male 0.082
我想选择与 'Interest' 列相同的名称并创建根据 'Scored.Probabilities':
排名的新数据集
Set Interest Age Gender Scored.Probabilities rank
1 AL008 18-24 male 0.211 1
1 AL008 35-44 female 0.102 2
1 AL008 25-34 female 0.002 3
2 AL024 13-17 male 0.102 1
2 AL024 35-44 female 0.051 2
2 AL024 55-64 male 0.025 3
2 AL024 25-34 male 0.022 4
2 AL024 35-44 male 0.016 5
3 AL034 45-54 male 0.021 1
4 AL035 35-44 female 0.027 1
5 AL036 35-44 male 0.082 1
使用 dplyr 试试这个
library("dplyr")
df <- read.table(text = "Interest Age Gender Scored.Probabilities
AL008 18-24 male 0.211
AL024 25-34 male 0.022
AL008 35-44 female 0.102
AL008 25-34 female 0.002
AL024 13-17 male 0.102
AL035 35-44 female 0.027
AL024 35-44 female 0.051
AL024 55-64 male 0.025
AL024 35-44 male 0.016
AL034 45-54 male 0.021
AL036 35-44 male 0.082" , header = T)
df %>%
arrange(Interest , desc(Scored.Probabilities)) %>%
group_by(Interest) %>%
mutate(rank = row_number())
你可以试试
library(data.table)
setDT(df1)[order(-Scored.Probabilities), rank:= 1:.N, Interest][
order(Interest), Set := .GRP, Interest][order(Interest, rank)]
# Interest Age Gender Scored.Probabilities rank Set
#1: AL008 18-24 male 0.211 1 1
#2: AL008 35-44 female 0.102 2 1
#3: AL008 25-34 female 0.002 3 1
#4: AL024 13-17 male 0.102 1 2
#5: AL024 35-44 female 0.051 2 2
#6: AL024 55-64 male 0.025 3 2
#7: AL024 25-34 male 0.022 4 2
#8: AL024 35-44 male 0.016 5 2
#9: AL034 45-54 male 0.021 1 3
#10: AL035 35-44 female 0.027 1 4
#11: AL036 35-44 male 0.082 1 5
我的数据集如下所示:
Interest Age Gender Scored.Probabilities
AL008 18-24 male 0.211
AL024 25-34 male 0.022
AL008 35-44 female 0.102
AL008 25-34 female 0.002
AL024 13-17 male 0.102
AL035 35-44 female 0.027
AL024 35-44 female 0.051
AL024 55-64 male 0.025
AL024 35-44 male 0.016
AL034 45-54 male 0.021
AL036 35-44 male 0.082
我想选择与 'Interest' 列相同的名称并创建根据 'Scored.Probabilities':
排名的新数据集Set Interest Age Gender Scored.Probabilities rank
1 AL008 18-24 male 0.211 1
1 AL008 35-44 female 0.102 2
1 AL008 25-34 female 0.002 3
2 AL024 13-17 male 0.102 1
2 AL024 35-44 female 0.051 2
2 AL024 55-64 male 0.025 3
2 AL024 25-34 male 0.022 4
2 AL024 35-44 male 0.016 5
3 AL034 45-54 male 0.021 1
4 AL035 35-44 female 0.027 1
5 AL036 35-44 male 0.082 1
使用 dplyr 试试这个
library("dplyr")
df <- read.table(text = "Interest Age Gender Scored.Probabilities
AL008 18-24 male 0.211
AL024 25-34 male 0.022
AL008 35-44 female 0.102
AL008 25-34 female 0.002
AL024 13-17 male 0.102
AL035 35-44 female 0.027
AL024 35-44 female 0.051
AL024 55-64 male 0.025
AL024 35-44 male 0.016
AL034 45-54 male 0.021
AL036 35-44 male 0.082" , header = T)
df %>%
arrange(Interest , desc(Scored.Probabilities)) %>%
group_by(Interest) %>%
mutate(rank = row_number())
你可以试试
library(data.table)
setDT(df1)[order(-Scored.Probabilities), rank:= 1:.N, Interest][
order(Interest), Set := .GRP, Interest][order(Interest, rank)]
# Interest Age Gender Scored.Probabilities rank Set
#1: AL008 18-24 male 0.211 1 1
#2: AL008 35-44 female 0.102 2 1
#3: AL008 25-34 female 0.002 3 1
#4: AL024 13-17 male 0.102 1 2
#5: AL024 35-44 female 0.051 2 2
#6: AL024 55-64 male 0.025 3 2
#7: AL024 25-34 male 0.022 4 2
#8: AL024 35-44 male 0.016 5 2
#9: AL034 45-54 male 0.021 1 3
#10: AL035 35-44 female 0.027 1 4
#11: AL036 35-44 male 0.082 1 5