frank

Question

我对 frank 函数感到困惑。这里的文档说：

Only for lists, data.frames and data.tables. The columns to calculate ranks based on. Do not quote column names. If ... is missing, all columns are considered by default. To sort by a column in descending order prefix a "-", e.g., frank(x, a, -b, c). The -b works when b is of type character as well.

所以我有我的数据：

structure(list(product = c("Product 1", "Product 1", "Product 1", 
                           "Product 1", "Product 1", "Product 5", "Product 5", "Product 5", 
                           "Product 5", "Product 5"), policyID = c("A738-33", "A738-33", 
                                                                   "A738-33", "A738-33", "A738-33", "A738-33", "A738-33", 
                                                                   "A738-33", "A738-33", "A738-33"), startYear = c(2014, 
                                                                                                                               2015, 2016, 2017, 2018, 2014, 2015, 2016, 2017, 2018), total = c("30000", 
                                                                                                                                                                                                     "30000", "30000", "30000", "30000", "10000", "10000", "10000", 
                                                                                                                                                                                                     "10000", "10000"), daily = c("150", "150", "150", "150", "150", 
                                                                                                                                                                                                                                     "80", "80", "80", "80", "80")), class = c("data.table", "data.frame"
                                                                                                                                                                                                                                     ), row.names = c(NA, -10L), .internal.selfref = <pointer: 0x7feec50126e0>, sorted = "product")

我想按列 total 和 daily 对这些数据进行排序。所以我这样做了：

> setDT(testDT)
> frankv(testDT, totallimit, rbddaily, ties.method="dense")
Error in colnamesInt(x, cols, check_dups = TRUE) : 
  argument specifying columns specify non existing column(s): cols[1]='30000'

奇怪的是，当我使用引号时，与文档中所说的完全相反，我得到的结果是：

frankv(testDT, cols=c("totallimit", "rbddaily"), ties.method="dense")

我也尝试过将 thin 集成到 data.table 中，然后又发生了一件奇怪的事情。从我拥有的 10 行数据中，我获得了 100 行。

testDT[,.(rank = frankv(testDT, cols=c("limit", "daily"), ties.method="dense")), by = c("policyID", "product", "startYear")]

我做错了什么，我该如何解决？文档没有太大帮助，也许我遗漏了一些东西...

Answer 1

对于frank你不应该引用，但是对于frankv（你使用的函数）你应该：

library(data.table)
frank(testDT, total, daily, ties.method="dense")

 [1] 2 2 2 2 2 1 1 1 1 1

frankv(testDT, cols=c("total", "daily"), ties.method="dense")

 [1] 2 2 2 2 2 1 1 1 1 1

frank - 在 R 中指定来自 data.table 的多列

frank - specifying multiple columns from data.table in R

r

data.table