data.tables 的 R 中 cut 函数的替代方案 - 因子的整数变量

Question

我想将整数变量 hp 转换为分类变量，除以 10。

mtcars[, hp_cat := cut(hp, 
    breaks = c(0, 10, 20, 30 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, Inf), 
include.lowest = TRUE )]

这会产生所需的结果，但是写出所有数字是查询单调乏味的。有没有更快的方法？理想情况下，替代方案也会产生更好的因子名称。

注意：我想在 data.table 中得到结果...所以没有 dplyr。

Answer 1

就用序列函数吧。根据具体情况，您可以将 -Inf 作为向量中的第一个元素。此外，label 参数将允许您分配名称，这在下面的代码中有效：labels = paste0("Group",2:length(BRKS))

BRKS <-    c( seq( 0 , 160, 10 ) , Inf )

mtcars[, hp_cat := cut(hp, breaks = BRKS , include.lowest = TRUE )]

Answer 2

另一个应该更快的选项：

mtcars[, hp_cat2 := ceiling(hp/10)*10][hp_cat2 > 160, hp_cat2 := Inf]

使用正确的极限作为你的更好的因子名称的命名

Alternative to cut function in R for data.tables - integer variables to factors