frank - 在 R 中指定来自 data.table 的多列
frank - specifying multiple columns from data.table in R
我对 frank
函数感到困惑。这里的文档说:
Only for lists, data.frames and data.tables. The columns to calculate
ranks based on. Do not quote column names. If ... is missing, all
columns are considered by default. To sort by a column in descending
order prefix a "-", e.g., frank(x, a, -b, c). The -b works when b is
of type character as well.
所以我有我的数据:
structure(list(product = c("Product 1", "Product 1", "Product 1",
"Product 1", "Product 1", "Product 5", "Product 5", "Product 5",
"Product 5", "Product 5"), policyID = c("A738-33", "A738-33",
"A738-33", "A738-33", "A738-33", "A738-33", "A738-33",
"A738-33", "A738-33", "A738-33"), startYear = c(2014,
2015, 2016, 2017, 2018, 2014, 2015, 2016, 2017, 2018), total = c("30000",
"30000", "30000", "30000", "30000", "10000", "10000", "10000",
"10000", "10000"), daily = c("150", "150", "150", "150", "150",
"80", "80", "80", "80", "80")), class = c("data.table", "data.frame"
), row.names = c(NA, -10L), .internal.selfref = <pointer: 0x7feec50126e0>, sorted = "product")
我想按列 total
和 daily
对这些数据进行排序。所以我这样做了:
> setDT(testDT)
> frankv(testDT, totallimit, rbddaily, ties.method="dense")
Error in colnamesInt(x, cols, check_dups = TRUE) :
argument specifying columns specify non existing column(s): cols[1]='30000'
奇怪的是,当我使用引号时,与文档中所说的完全相反,我得到的结果是:
frankv(testDT, cols=c("totallimit", "rbddaily"), ties.method="dense")
我也尝试过将 thin 集成到 data.table 中,然后又发生了一件奇怪的事情。从我拥有的 10 行数据中,我获得了 100 行。
testDT[,.(rank = frankv(testDT, cols=c("limit", "daily"), ties.method="dense")), by = c("policyID", "product", "startYear")]
我做错了什么,我该如何解决?文档没有太大帮助,也许我遗漏了一些东西...
对于frank
你不应该引用,但是对于frankv
(你使用的函数)你应该:
library(data.table)
frank(testDT, total, daily, ties.method="dense")
[1] 2 2 2 2 2 1 1 1 1 1
frankv(testDT, cols=c("total", "daily"), ties.method="dense")
[1] 2 2 2 2 2 1 1 1 1 1
我对 frank
函数感到困惑。这里的文档说:
Only for lists, data.frames and data.tables. The columns to calculate ranks based on. Do not quote column names. If ... is missing, all columns are considered by default. To sort by a column in descending order prefix a "-", e.g., frank(x, a, -b, c). The -b works when b is of type character as well.
所以我有我的数据:
structure(list(product = c("Product 1", "Product 1", "Product 1",
"Product 1", "Product 1", "Product 5", "Product 5", "Product 5",
"Product 5", "Product 5"), policyID = c("A738-33", "A738-33",
"A738-33", "A738-33", "A738-33", "A738-33", "A738-33",
"A738-33", "A738-33", "A738-33"), startYear = c(2014,
2015, 2016, 2017, 2018, 2014, 2015, 2016, 2017, 2018), total = c("30000",
"30000", "30000", "30000", "30000", "10000", "10000", "10000",
"10000", "10000"), daily = c("150", "150", "150", "150", "150",
"80", "80", "80", "80", "80")), class = c("data.table", "data.frame"
), row.names = c(NA, -10L), .internal.selfref = <pointer: 0x7feec50126e0>, sorted = "product")
我想按列 total
和 daily
对这些数据进行排序。所以我这样做了:
> setDT(testDT)
> frankv(testDT, totallimit, rbddaily, ties.method="dense")
Error in colnamesInt(x, cols, check_dups = TRUE) :
argument specifying columns specify non existing column(s): cols[1]='30000'
奇怪的是,当我使用引号时,与文档中所说的完全相反,我得到的结果是:
frankv(testDT, cols=c("totallimit", "rbddaily"), ties.method="dense")
我也尝试过将 thin 集成到 data.table 中,然后又发生了一件奇怪的事情。从我拥有的 10 行数据中,我获得了 100 行。
testDT[,.(rank = frankv(testDT, cols=c("limit", "daily"), ties.method="dense")), by = c("policyID", "product", "startYear")]
我做错了什么,我该如何解决?文档没有太大帮助,也许我遗漏了一些东西...
对于frank
你不应该引用,但是对于frankv
(你使用的函数)你应该:
library(data.table)
frank(testDT, total, daily, ties.method="dense")
[1] 2 2 2 2 2 1 1 1 1 1
frankv(testDT, cols=c("total", "daily"), ties.method="dense")
[1] 2 2 2 2 2 1 1 1 1 1