使用 data.table 按组对变量进行排名

Question

我正在尝试对其他两个变量组中的一个变量进行排名。我在 data.table 中使用 frank。我似乎无法让 by 参数按照我期望的方式工作

这是我的数据：

structure(list(indpn = c(170, 170, 170, 170, 170, 170, 9870, 
9870, 9870, 9870, 9870, 9870), occpn = c(6050, 9130, 205, 5120, 
5740, 6005, 3930, 700, 1410, 3645, 1050, 150), ncwc = c(258575, 
4747, 10742, 205, 867, 11026, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 
-12L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000000181ef0>)

这是我使用的代码

z[ , therank := frank( -ncwc , ties.method ="min" ) , by = .(indpn, occpn) ]

这是我收到的：

    indpn occpn   ncwc therank
 1:   170  6050 258575       1
 2:   170  9130   4747       1
 3:   170   205  10742       1
 4:   170  5120    205       1
 5:   170  5740    867       1
 6:   170  6005  11026       1
 7:  9870  3930      0       1
 8:  9870   700      0       1
 9:  9870  1410      0       1
10:  9870  3645      0       1
11:  9870  1050      0       1
12:  9870   150      0       1

我希望 therank 变量为 return 1,4,3,6,5,2,1,1,1,1,1,1

Answer 1

如注释中所述，仅按 indpn 分组给出了预期的输出

library(data.table)
z[ , therank := frank(-ncwc , ties.method ="min" ) ,indpn]
z

#    indpn occpn   ncwc therank
# 1:   170  6050 258575       1
# 2:   170  9130   4747       4
# 3:   170   205  10742       3
# 4:   170  5120    205       6
# 5:   170  5740    867       5
# 6:   170  6005  11026       2
# 7:  9870  3930      0       1
# 8:  9870   700      0       1
# 9:  9870  1410      0       1
#10:  9870  3645      0       1
#11:  9870  1050      0       1
#12:  9870   150      0       1

但是，请注意 frank 的行为方式。这是您正在寻找的输出吗？

z$ncwc[12] <- -1
z[ , therank := frank( -ncwc , ties.method ="min" ) ,indpn]
z
#    indpn occpn   ncwc therank
# 1:   170  6050 258575       1
# 2:   170  9130   4747       4
# 3:   170   205  10742       3
# 4:   170  5120    205       6
# 5:   170  5740    867       5
# 6:   170  6005  11026       2
# 7:  9870  3930      0       1
# 8:  9870   700      0       1
# 9:  9870  1410      0       1
#10:  9870  3645      0       1
#11:  9870  1050      0       1
#12:  9870   150     -1       6

如果您希望最后一个值是 2 而不是 6，您可以使用 match 和 unique

z[order(-ncwc) , therank := match(ncwc, unique(ncwc)) ,indpn]
z
#    indpn occpn   ncwc therank
# 1:   170  6050 258575       1
# 2:   170  9130   4747       4
# 3:   170   205  10742       3
# 4:   170  5120    205       6
# 5:   170  5740    867       5
# 6:   170  6005  11026       2
# 7:  9870  3930      0       1
# 8:  9870   700      0       1
# 9:  9870  1410      0       1
#10:  9870  3645      0       1
#11:  9870  1050      0       1
#12:  9870   150     -1       2

使用 data.table 按组对变量进行排名

Ranking Variables by group using data.table

r

rank

data.table